Comparison and improvement of time aware collaborative filtering techniques : Recommender systems

(1)

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se

Linköping University | Department of Computer and Information Science

Bachelor’s thesis, 16 ECTS | Datateknik

202019 | LIU-IDA/LITH-EX-G--2019/023--SE

Comparison and improvement of

time aware collaborative ﬁltering

techniques

–

Recommender systems

Jämförelsestudie och förbättring av tidsmedvetna collaborative

ﬁltering tekniker

Otto Denesfay

David Grönberg

Supervisor : George Osipov Examiner : Ola Leiﬂer

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Recommender systems emerged in the mid ’90s with the objective of helping users select items or products most suited for them. Whether it is Facebook recommending people you might know, Spotify recommending songs you might like or Youtube recommending videos you might want to watch, recommender systems can now be found in every corner of the internet. In order to handle the immense increase of data online, the development of sophisticated recommender systems is crucial for filtering out information, enhancing web services by tailoring them according to the preferences of the user. This thesis aims to improve the accuracy of recommendations produced by a classical collaborative filtering recommender system by utilizing temporal properties, more precisely the date on which an item was rated by a user. Three different time weighted implementations are presented and evaluated: time weighted prediction approach, time weighted similarity approach and our proposed approach, weighting the mean rating of a user on time. The different approaches are evaluated using the well known MovieLens 100k dataset. Results show that it is possi-ble to slightly increase the accuracy of recommendations by utilizing temporal properties.

(4)

Acknowledgments

Firstly, we would like to thank Cybercom for their assistance during this thesis. In particular, Fredrik Loch and John Wibrand for their assistance in making this thesis a reality. In addition, we would like to thank our supervisor George Osipov, as well as our examiner Ola Leifler, for their constructive feedback and useful suggestions throughout the thesis work.

(5)

List of Figures

3.1 Line plot showing the amount of ratings made per month for the MovieLens

100k dataset. . . . 12

3.2 Bar plot showing the rating distribution for the MovieLens 100k dataset. . . . . 12

4.1 MAE for the user-based time unaware approach. . . . 18

4.2 RMSE for the user-based time unaware approach. . . . 18

4.3 MAE for the item-based time unaware approach using Pearson Correlation Co-efficient. . . . 18

4.4 RMSE for the item-based time unaware approach using Pearson Correlation Coefficient. . . . 18

4.5 MAE for the item-based time unaware approach using Adjusted Cosine Simi-larity. . . . 19

4.6 RMSE for the item-based time unaware approach using Adjusted Cosine Simi-larity. . . . 19

4.7 RMSE and MAE for different decay values λ for the item-based Time weighted prediction approaches.. . . 20

4.8 RMSE and MAE for different decay values λ for the user-based Time weighted prediction approach. . . . 20

4.9 RMSE and MAE for different decay values α for the item-based Time weighted similarity approaches. . . . 20

4.10 RMSE and MAE for different decay values α for the user-based Time weighted similarity approaches. . . . 20

4.11 RMSE and MAE for different decay values θ for the user-based Time weighted mean user rating approach operating on epoch timestamps.. . . 21

4.12 RMSE and MAE for different decay values θ for the user-based Time weighted mean user rating approach operating on days. . . . 21

(8)

List of Tables

2.1 Rating matrix of users 1-4 and their corresponding ratings of items 1-4.. . . 5 3.1 Description of MovieLens 100k dataset (u.data). . . . 11 4.1 Optimal k and C for time unaware approaches. . . . 19 4.2 RMSE and MAE for time unaware approaches using the corresponding optimal

parameter values. . . . 22 4.3 RMSE and MAE for the Time weighted prediction approaches using the

corre-sponding optimal parameter values.. . . 22 4.4 RMSE and MAE for the Time weighted similarity approaches using the

corre-sponding optimal parameter values.. . . 22 4.5 RMSE and MAE for the Time weighted mean user rating approaches using the

corresponding optimal parameter values. . . . 22 4.6 RMSE and MAE for the Time weighted hybrid approaches using the

corre-sponding optimal parameter values.. . . 23 4.7 Decrease in RMSE and MAE (%) for the time weighted item-based approaches

using Adjusted Cosine Similarity. . . . 23 4.8 Decrease in RMSE and MAE (%) for the time weighted item-based approaches

using Pearson Correlation Coefficient. . . . 23 4.9 Decrease in RMSE and MAE (%) for the time weighted user-based approaches

using Pearson Correlation Coefficient/Adjusted Cosine Similarity. . . . 24 4.10 Summary of RMSE and MAE for different configurations and approaches. . . . 24

(9)

1 Introduction

The following section presents the thesis and the hypothesis of the work, as well as a presen-tation of the company this study is conducted in collaboration with, the research questions and delimitations.

1.1 Motivation

The amount of data on the internet is growing exponentially. Human- and machine gener-ated data is experiencing a 10x faster growth rate compared to traditional business data [1]. Users are being exposed to a huge amount of information, making it difficult to tailor ser-vices to fit the preferences of everyone. This has fueled the rise of recommender systems. In the last decade, recommender systems have grown in popularity and can now be found in most domains of the internet like entertainment, e-commerce, social media platforms, match making, news sites and many more [2].

The main objective of a recommender system is to present candidate items to users that they most likely would be interested in. They can also be described as systems that aim to predict the interests of users and recommend items that most likely will be interesting to them [3]. These recommendations are calculated by the use of previously collected consumer behaviour data.

Recommender systems are present in the majority of the most popular social media plat-forms; Facebook, Twitter, Youtube and LinkedIn all implement some sort of recommender system. These recommendations can be seen in the form of people you may know, videos you may like, jobs you may be interested in etc. Youtube uses user activity history i.e. videos that has been watched, search history and user demographics and deploys a two-stage infor-mation retrieval approach, where one system generates the recommendations and the other evaluates the generated recommendations [4].

Well performing recommender systems can have a great impact on the services that in-tegrate them. Mckinsey conducted a study, showing that up to 75% and 60% respectively of what users watch on Netflix and Youtube are recommendations given by their recom-mendation feature. Additionally, the recommender system Amazon implements brought the company 35% of its revenue [5]. Being able to understand customer preferences and provide personalized services has long been a goal for companies, where recommender systems play an important role.

(10)

1.2. Time-based collaborative filtering recommender system

For this paper, we have decided to evaluate three time aware collaborative filtering ap-proaches and compare them to their respective time unaware collaborative filtering approach. The evaluation will be conducted by measuring the accuracy of the recommendations pro-duced by each approach.

1.2 Time-based collaborative filtering recommender system

The preferences of people change over time. One month you might be really into food docu-mentaries and rock music, while the next month, sitcoms and classical music might be what you prefer. The way you rate and interact with items might also differ. Our hypothesis is that a time aware recommender system taking these changes in preference and behaviour into consideration will give more accurate recommendations which reflects upon the current taste of the user, more so than a classical collaborative filtering approach.

Previous studies have shown that taking temporal properties into consideration when designing recommender systems can result in improved accuracy. One of these studies [6] compares a classical collaborative filtering approach with a time aware ones, showing that models taking time into consideration can reduce the Mean Absolute Error (MAE, see section 2.6) by up to six percent, depending on the sparsity of the dataset.

1.3 Cybercom

This thesis is conducted in collaboration with Cybercom1Jönköping. Cybercom is an IT con-sulting company founded in 1995, operating mainly in the Nordic region. The primary focus of the company is cloud services, security, Internet of Things (IoT) and digital transformation. Cybercom Innovation Zone is a sub-organisation of the company that focuses on developing and testing innovational ideas. A large part of the companies that Cybercom works with specializes in web services, where a well performing recommender system can increase ef-ficiency and quality of service. Cybercom Innovation Zone is interested in exploring new approaches and ways of designing recommender systems.

1.4 Aim

This paper aims to study the potential gain in accuracy when building recommender systems taking temporal properties into account. The accuracy of time aware collaborative filtering recommender systems will be compared against collaborative filtering systems disregarding temporal properties.

1.5 Research questions

The following research questions and their corresponding answers will form the premise of the paper:

1. How can a time aware collaborative filtering recommender system be constructed? 2. How does incorporating temporal information affect the accuracy of collaborative

fil-tering recommender systems?

1.6 Delimitations

The development of recommender systems is a broad subject where a variety of aspects of the system can be evaluated. To fit the scope of this study, the following delimitations are made:

(11)

1.6. Delimitations

1. This study will only evaluate recommendations for the MovieLens 100k dataset. 2. The recommender system will be evaluated using offline testing (see section 2.6). 3. The accuracy of recommendations will be the only factor relevant for this study,

(12)

2 Theory

The following sections describes the collected theory forming the basis of this thesis. Firstly, we describe how ratings are used in recommender systems. Following that, the three classical approaches when developing recommender systems are described, including a description of the most used similarity metrics. Lastly, we present and describe existing approaches incorporating temporal properties as well as how the accuracy of recommender systems can be evaluated.

2.1 Ratings

Recommender systems provides recommendations for a user, based on previous ratings made by users [7]. Ratings can consist of numerical ratings, often on a five-degree scale, binary ratings as a "thumbs up" or "thumbs down" or unary ratings, for example a user pur-chasing an item [7]. The data used when generating ratings are often categorized into two types: implicit or explicit. Implicit data is information derived from the actions of a user. This could be information about how many times a user has clicked on a specific product or the amount of time a user spent on the page of a product. Explicit data refers to information intentionally given by a user, often represented as ratings or "likes" [7].

2.2 Content-based filtering

Content-based filtering is based on the assumption that if a user has expressed liking to a cer-tain item with cercer-tain attributes, it will presumably like other items with the same attributes [7]. These systems analyzes the properties of the items a user has interacted with and presents candidate items with similar attributes. For example, if a user has watched movies catego-rized to a certain genre, the recommender system would present other movies of that same genre.

The two main approaches when developing content-based filtering systems are to either compare items with items individually, or give each user a taste profile based on their history of preference [7], through which recommendations are derived. Content-based filtering is commonly used in systems dealing with text-based items, like news articles and documents. The text is extracted from the item using text retrieval methods and, by using keyword anal-ysis techniques, the item is given certain attributes [8]. For example, if a user reads a lot

(13)

2.3. Collaborative filtering

of sports articles, content-based filtering will assign higher utility to articles containing key-words like "players", "stadium", "winning" and lower utility to articles where the keykey-words are mentioned less [8].

One challenge of content-based filtering is assigning features to items. While retrieving information from text-based items is quite trivial, other types of items where the properties are not obvious at a first glance, such as videos, audio and images, can be more challenging [8]. Additionally, in content-based filtering, users run the risk of only being recommended items within their taste profiles, limiting the possibility of exploration [9].

2.3 Collaborative filtering

In general, collaborative filtering evaluates the preference of a user, and recommends items that users with similar preferences has interacted with, rather than comparing the properties of the items. To clarify, if a user has expressed liking to a certain movie, that user would be recommended movies that other users, who also has expressed liking to the same movie, has watched. This method relies heavily on the preconception that users that has shown similar previous behaviour in terms of preference will show the same liking to unexplored items.

Each user is represented as a vector of a specific length, where the length is determined by the amount of unique items a user has rated. Each index of the vector contains the rating a user has given a specific item, implicitly or explicitly. Combining vectors from multiple users form a matrix, from which predictions can be calculated. [10]

Item 1 Item 2 Item 3 Item 4

User 1 2 - 4 3

User 2 4 3 1

-User 3 2 5 3 3

User 4 - 5 - 5

Table 2.1: Rating matrix of users 1-4 and their corresponding ratings of items 1-4.

In figure 2.1, user 1 and 3 would be considered similar, given that they have rated the same items equally. Since user 3 rated item 2 a 5, the recommender system would, principally, predict that user 1 also rated item 2 a 5 (further explained in section 2.3.1.5).

In contrary to the content-based approach, which requires analysis of the items in order to find similarities, collaborative filtering requires no information about the items themselves, since the properties of the items carry no weight in regards to producing recommendations. Only information about whether or not a user appreciates a specific item is required when calculating similarities.

Collaborative filtering can be divided into two subcategories, memory-based and model-based:

2.3.1 Memory-based

The memory-based collaborative filtering approach is, in turn, made up of two categories: user-based and item-based. The user-based approach utilizes the entire set of ratings of a par-ticular user to find other users with similar preferences, also called neighbors. Neighbors are identified by the use of similarity measuring methods, which together form a neighborhood of users, through which the rating a certain user would rate a certain item is predicted [11]. The item-based approach calculates the similarity between items and recommends a certain user items that have a strong correlation with the items that specific user has expressed liking to.

User similarity in this context is represented by a distance in a two-dimensional space where the dimensions represents ratings a specific user has given. If the distance is small, the

(14)

users have a high degree of similarity and if the distance is low, the users have a low degree of similarity. Item similarity is calculated in the same way, where the dimensions represents ratings given to a specific item. Pearson correlation coefficient, Cosine similarity and Adjusted cosine similarity are three of the most widely used similarity metrics [12]. The performance of memory-based collaborative filtering approaches relies heavily on the sparsity of the data, due to similar users or items being discovered solely based on the ratings a set of users has given.

2.3.1.1 Pearson Correlation Coefficient

The Pearson Correlation Coefficient shows the linear relationship between two users or items. The formulae returns a value between -1 and 1, where 1 indicates a strong positive correlation, -1 indicates a strong negative correlation and 0 indicates no correlation at all.

The similarity between user a and user u is calculated by:

sim(a, u) =

ř

iPIaXIu(ra,i´¯ra) (ru,i´¯ru)

b ř iPIaXIu(ra,i´¯ra) 2 ˚ b ř iPIaXIu(ru,i´¯ru) 2 (2.1)

where ra,i and ru,iis the rating user a or u gave item i. ¯raand ¯ruis the mean rating for user a

or u.ř

uPUiXUj represents the intersection of the two rating vectors for user a and user u.

The similarity between item i and item j is calculated by:

sim(i, j) =

ř

uPU_iXU_j(ru,i´¯ri) ru,j´¯rj

b ř uPUiXUj(ru,i´¯ri) 2 ˚ b ř uPUiXUj ru,j´¯rj 2 (2.2)

where ru,iand ru,jis the rating user u gave item i or j. ¯riand ¯rjis the mean rating for item i or

j.

The Pearson Correlation Coefficient deals with grade inflation, which occurs when users use different grading scales to express their preference. Consider two users, Clara and Bob. Clara loved the movie Toy Story and rated it a 4, while Bob found the movie average, but because they have different grading scales, he also rated it a 4. By subtracting the mean rating from the actual rating, this potential problem is dealt with [13].

2.3.1.2 Cosine Similarity

When calculating Cosine Similarity, the ratings of two users are represented by two vectors and their similarity is measured by computing the cosine of the angle between the vectors. Cosine Similarity ignores unrated items and is therefore favourable when dealing with sparse data [13].

sim(a, u) =

ř

iPIaXIura,i˚ru,i

b ř iPIaXIur 2 a,i˚ b ř iPIaXIur 2 u,i (2.3)

where ra,iand ru,irepresents the rating user a and u gave item i.

The similarity between item i and item j is calculated in the same way:

sim(i, j) =

ř

uPUiXUjru,i˚ru,j

b_ř uPUiXUjr 2 u,i˚ b_ř uPUiXUjr 2 u,j (2.4)

(15)

2.3.1.3 Adjusted Cosine Similarity

Adjusted Cosine Similarity is a modified version of Cosine Similarity. Like the Pearson Corre-lation Coefficient, it addresses the problem of grade infCorre-lation, by subtracting the mean rating of a user or item from the actual rating.

sim(a, u) =

ř

iPIaXIu(ra,i´¯ra) (ru,i´¯ru)

b ř iPIaXIu(ra,i´¯ra) 2 ˚ b ř iPIaXIu(ru,i´¯ru) 2 (2.5)

Note that when calculating user similarity with Adjusted Cosine Similarity, the function is exactly the same as the Pearson Correlation Coefficient.

The similarity between item i and item j is calculated by:

sim(i, j) =

ř

uPUiXUj(ru,i´¯ru) ru,j´¯ru

b ř uPU_iXU_j(ru,i´¯ru)2˚ b ř uPU_iXU_j ru,j´¯ru2 (2.6) 2.3.1.4 Significance weighting

When calculating the similarity between two users, a fair preconception is that only items which both users have rated should be considered. This leads to problems when users have a small amount of commonly rated items. Calculating the similarity between two users solely based on a couple of items will not yield a fair result. Herlocker et al.[14] introduces a signif-icance weighting function that deals with this issue. It devalues similarities that are based on a small amount of commonly rated items.

The significance weighting for user similarity is calculated by: min(|IaXIu|, C)

C ¨sim(a, u) (2.7)

where C is a cutoff value, commonly set to 50. If the co-rated amount n is less than the cutoff value C, the similarity is multiplied by n/C. If n is larger than the cutoff value, the similarity is unaffected.

2.3.1.5 Rating prediction with Nearest Neighbor

When constructing a set of users with similar preferences, neighborhoods are created con-sisting of users that has expressed liking to similar items (in terms of ratings). In order to construct the neighborhood for a target user, the similarity of the target user and every other user in the system is computed using one of the previously mentioned similarity metrics, where the top n users with the highest similarity form the neighborhood. [15]

When predicting the rating a certain user would rate a certain item, the neighborhood for that specific user is constructed and the ratings the neighboring users gave that specific item are used to make the prediction. [15]

For a user-based collaborative filtering approach, the prediction for user a on item i is calculated by:

prediction(a, i) =

ř

uPNaru,i¨Sim(a, u)

ř

uPNa|Sim(a, u)|

(2.8) where ru,irepresents the rating the neighboring user u gave item i. Narepresents the nearest

neighbors of user a.

For an item-based collaborative filtering approach, the prediction for user a on item i is calculated by: prediction(a, i) = ř jPNira,j¨Sim(i, j) ř jPNi|Sim(i, j)| (2.9)

(16)

2.4. Hybrid

where ra,jrepresents the rating user a gave the neighboring item j. Nirepresents the nearest

neighbors of item i.

This approach does not take grade inflation into account and will not deal with users having diverse rating patterns. The mean centered prediction function calculates the mean centered rating for every user in the neighborhood of the target user. It is defined by sub-tracting the mean rating from the actual rating.

For a user-based collaborative filtering approach, the mean centered prediction for user a on item i is calculated by:

prediction(a, i) =¯ra+

ř

uPNaSim(a, u)¨(ru,i´¯ru)

ř

uPNa|Sim(a, u)|

(2.10)

where ¯ru represents the mean rating of the neighboring user u and ¯ra is the mean rating of

user a.

For an item-based collaborative filtering approach, the mean centered prediction function is calculated by: prediction(a, i) =¯ri+ ř jPNiSim(i, j)¨(ra,j´¯rj) ř jPNi|Sim(i, j)| (2.11) where ¯rjrepresents the mean rating of the neighboring item j and ¯riis the mean rating of item

i.

If a prediction based on the ratings of the neighboring users is not possible, i.e. no users in the neighborhood have rated the target item, returning the mean rating of the target user or the mean rating of the target item are popular solutions.

2.3.2 Model-based

The model-based approach, in contrary to the memory-based approach, works by construct-ing a model of the preferences of a user, from which ratconstruct-ings of non-rated items are estimated in a probabilistic manner [11], i.e. discovering latent factors. The user preference model is constructed by the use of certain algorithms, some of which are Bayesian networks, cluster-ing [16] and matrix factorization [17]. Algorithms based on matrix factorization has proven to be some of the more successful ones at discovering latent factors [17].

Unlike the memory-based approach, latent factors that characterizes a certain user can be discovered without exploring the entire set of ratings the user has given, such as preferring a certain type of lead character of a movie or restaurants serving a certain type of dish. [17]

2.4 Hybrid

Hybrid recommender systems implements some arbitrary parts of the content-based filtering approach and the collaborative filtering one. Additionally, recommender systems implement-ing some sort of additional calculation, such as recommendations weighted towards specific parameters, are also considered hybrids. Hence, the time weighted recommender systems this study aims to evaluate are considered hybrids.

2.5 Time aware collaborative filtering

Previous studies aiming to implement various time aware approaches have shown that ac-curacy can be improved when weighting the recommender system towards the time aspect [18][19][6][20][21][22][23]. Campos et al.[18] introduces a strategy only considering the most recent ratings of neighbors when predicting the rating a target user gave a target item, in a user-based collaborative filtering recommender system. The strategy is based on the as-sumption that similar users tend to be similar throughout time. If the rating of a neighbor

(17)

2.5. Time aware collaborative filtering

was done outside of the interval of a certain cutoff value, the rating will not be considered. An issue with this strategy is, if the interval cutoff value is to small, there will not be enough information available to calculate an adequate prediction.

Liu et al.[23] implements a time based weight function that assigns recent ratings greater significance in the prediction phase. The time weight function is calculated by:

f(t) =e´α(t´tu,i)_{, 0 ď α ď 1} _(2.12)

where t is the current time and tu,iis the time when user u rated item i.

f (t) produces values between 0 and 1, where the higher the t, the lower the f (t), i.e. more recent ratings produce a higher f (t) giving the rating greater significance. α regulates the rate of decay. Higher values of α make the prediction function more sensitive to time differences, while having α = 0 results in no time consideration at all. This time function can be combined with the prediction function. For an item-based time aware prediction approach, a prediction is calculated by: prediction(a, i) = ř jPNirai¨Sim(i, j)¨fa,j(t) ř jPNiSim(i, j)¨fa,j(t) (2.13) where fa,j(t)is the time weight function that takes the time user a rated item j as input. Niis

the nearest neighbors of item i.

The time weight function can also be applied when determining similarity between items. To measure similarity between items by incorporating temporal factors, Liu et al.[23] modifies the Cosine Similarity function with their time weight function by:

sim(i, j) =

ř

uPU_iXU_j ru,i¨fu,i(t))(ru,j¨fu,j(t))

b ř

uPUiXUj ru,i¨fu,i(t))

2bř

uPUiXUj ru,j¨fu,j(t))

2 (2.14)

where fu,i(t)and fu,j(t)is the time weight function that takes the time user u rated item i or j

as input.

This approach does not only identify similar items that have similar characteristics in terms of ratings, it also identifies similar items in terms of users that have rated them in the same time span. To simplify, items with high similarity are items that have been rated similarly and close in time.

Ding and Li [19] implements a similar time based weight function that assigns recent ratings greater significance. In the prediction function, each rating is assigned a time based weight f (t) as follows: f(t) =e´λ(t) (2.15) where λ: λ= 1 T0 (2.16) and: F(T0) = (1/2)f(0) (2.17)

For a user-based time aware prediction approach, a prediction is calculated by:

prediction(a, i) =

ř

uPNaru,i¨Sim(a, u)¨f(tu,i)

ř

uPNa|Sim(a, u)| ¨f(tu,i)

(2.18)

where tu,irepresents the time user u rated item i.

Users activity history can vary, where some user preferences changes frequently while others do not. Ding and Li [18] introduces a way to compute the T0value for each user and

each cluster of items, to more precisely predict future preference of a user, yielding better per-formance. They use simple K-Means clusters to summarize items into different clusters and

(18)

2.6. Evaluation

then compute the optimal T0for the clusters by scanning all the possible values of parameter

T0.

Lee, Park and Park [21] implements a time weighted system by dividing users and items into both 3 and 5 different groups. When having 3 groups for users and items respectively, each user is grouped depending on the time of purchase of a specific item, while the items were grouped depending on their launch date: "Old purchase group", "Middle purchase group", "Recent purchase group" for users and "Old launch group", "Middle launch group" and "Recent launch group" for items. The same concept was applied to the "groups of 5" al-ternative. The study showed that the amount of items recommended by the recommender system, when implementing the time factor, can increase the accuracy of recommendations by up to 47%, depending on which grouping technique and similarity measure method that is used.

2.6 Evaluation

Evaluation of recommender systems can be done using live user feedback, offline analysis or a combination of the two. Live user feedback refers to users using the recommender system and giving feedback on how well they thought the recommendations corresponded to their actual preference. In offline analysis, the recommender system predicts a rating for a particular item that a certain user has already rated, and then compares the difference between the rating the recommender system predicted and the actual rating [24]. The data is split into a train and test set where, in a neighborhood collaborative filtering approach, the similarities and nearest neighbors for each user or item is calculated by using the train set. The evaluation is carried out by predicting the ratings present in the test set. The two most commonly used metrics for offline evaluation are Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Both measures the accuracy of a set of predictions, where lower values equals higher accuracy.

Root Mean Squared Error

RMSE= g f f e1 n n ÿ i=1 d2_i (2.19)

Mean Absolute Error

MAE= 1 n n ÿ i=1 |di| (2.20)

where n is the number of tests made and d is the difference between the prediction made and the actual rating the user made.

Given that each difference in rating is squared before divided with the number of tests made, RMSE is more sensitive to cases where the predicted rating drastically differs from the actual rating, i.e. it is valid to say that the lower the RMSE the better a model handles extreme cases. MAE simply measures the average magnitude of the errors in a set of predictions, describing the overall robustness of a model [24].

(19)

3 Method

The following section describes the methodology chosen to realise the study explained in previous sections, based on the research questions presented in section 1.5.

Firstly, the dataset acquired for the study, its characteristics and the properties of each entry in the set is described in section 3.1. Subsequently, a description of the software used to realize the recommender systems, both for evaluation- and implementational purposes is presented in section 3.2. The different approaches tested, including time unaware and time aware systems, are described in section 3.3 (this section also includes a description of a pro-posed time weighted mean user rating approach). Following that, we present the parameter op-timization in section 3.4. Lastly, the methodology used to evaluate the recommender systems is presented in section 3.5.

3.1 Data

The 100k MovieLens dataset, provided by GroupLens, is used to evaluate the models and measure the accuracy of the predictions made. The dataset is commonly used when research-ing recommender systems, in particular approaches utilizresearch-ing temporal properties ([19], [6], [24], [22]). The 100k MovieLens dataset contains ratings made by 943 users on 1682 movies. Each user has rated at least 20 movies, resulting in a total of 100,000 ratings. The set is struc-tured into several files containing information about the movies and the users. Ratings made by users are located in the ’u.data’ file, structured as follows:

100k u.data

user_id movie_id rating unix_timestamp

196 242 3 881250949

186 302 3 891717742

22 377 1 878887116

244 51 2 880606923

... ... ... ...

(20)

3.1. Data

The ratings have been collected during an eight month period, ranging from September 1997 to April 1998. Figure 3.1 and 3.2 shows the amount of ratings made each month, as well as the rating distribution over the MovieLens 100k dataset. The ratings have a fairly even distri-bution over the months excluding November, where the rating amount drastically increases. These characteristics gives us confidence that the set is suitable when evaluating our time based recommender systems. We convert all the timestamps to days since the first rating in the dataset was made (19 September 1997).

Figure 3.1: Line plot showing the amount of ratings made per month for the MovieLens

100k dataset.

Figure 3.2: Bar plot showing the rating distribution for the MovieLens 100k dataset.

3.1.1 Train- and test set

The data from ’u.data’ is split into a train- and test set, where the train set is used to train the recommender system, in the case of this study to find similarities between users and items. The test set is used to evaluate the predictions made, by letting the recommender system predict the ratings present in the test set. We construct the test set by using the well known leave-one-out method, extracting the most recent rating of each user resulting in a set containing 943 ratings, making up 0.943% of the total amount of ratings. Consequently, the rest of the ratings make up the train set, 99.057% of the total amount of ratings.

(21)

3.2. Software

3.2 Software

The recommender systems are developed and evaluated in Python 3.6 using Pandas1, Numpy2, scikit-learn3and SciPy4. Pandas is a Python BSD licensed data analysis library that provides data structures and data analysis tools simplifying data manipulation. In the context of this study, its the main area of use is to load and sort the mentioned dataset. Numpy is a li-brary containing support for multi-dimensional arrays as well as high-level mathematical functions, used in this study to create and manipulate matrices. Scikit-learn contains a large amount of different machine-learning and data manipulation functions, which we use to cal-culate accuracy metrics. SciPy is an open-source library including functions regarding math-ematics, science and engineering. Our main use of the library is to validate similarity metrics. Additionally, for heavier calculations and more thorough evaluations, we use AWS and their SageMaker service5.

3.3 Approaches

In this section we present the different approaches we choose to evaluate. We elaborate on the memory-based collaborative filtering approach further, due to benefits such as simplicity of implementation. Model-based collaborative filtering is disregarded due to time limitations and having potential benefits being out of the scope of this study. We present twelve time aware memory-based collaborative filtering approaches taking temporal properties into ac-count and compare them against their corresponding time unaware approach. They are also compared to two simple benchmark prediction methods; Mean user rating prediction which only predicts the mean rating of a user and Mean item rating prediction, predicting the mean rating of an item.

In terms of similarity metrics used, even though there exists a large amount, we choose to evaluate two of the most commonly used algorithms, being Pearson Correlation Coefficient and Adjusted Cosine Similarity (as explained in section 2.3.1). It is possible to implement the Pearson Correlation Coefficient with SciPy, but we create our own version of the metric to simplify incorporating temporal properties. We failed to find any trustworthy libraries including Adjusted Cosine Similarity and create our own version of this metric as well.

The following time unaware approaches are evaluated:

1. Item-based collaborative filtering using Adjusted Cosine Similarity. 2. Item-based collaborative filtering using Pearson Correlation Coefficient.

3. User-based collaborative filtering using Adjusted Cosine Similarity/Pearson Correla-tion Coefficient (see secCorrela-tion 2.3.1.3).

3.3.1 Proposed time weighted mean user rating approach

To further strengthen the impact of temporal properties and potential changes in user be-haviour over time, we implement a time weighted mean user rating in the mean centered prediction function (see section 2.3.1.5). As mentioned, the purpose of the mean centered prediction function is to address the problem of grade inflation.

We argue that grading scales might change over time and address this by making the following addition to the user-based mean centered prediction function:

prediction(a, i) =¯ra,t+

ř

uPNaSim(a, u)¨(ru,i´¯ru,t)

ř |Sim(a, u)| (3.1) 1_{https://pandas.pydata.org/} 2_{https://www.numpy.org/} 3_{https://scikit-learn.org/} 4_{https://www.scipy.org/} 5_{https://aws.amazon.com/sagemaker/?nc2=type_a}

(22)

3.3. Approaches

where ¯ra,tand ¯ru,tis the time weighted mean rating for user a and u:

¯ru,t = ř rPNur ¨ f(tr) ř rPNa f(tr) (3.2)

where tr is the time when rating r was given by u. f(tr)is the time weight presented by Liu

et al.[23] (see section 2.5).

f(t) =e´θ(t´tr)_{, 0 ď θ ď 1.} _(3.3)

We change the decay notation α to θ in our proposed model to simplify differentiation be-tween the different decay parameters.

This results in recent ratings being given higher weights, giving a more precise estimate of the rating pattern of the neighboring users. We argue that this approach can only be adopted in the user-based approaches, since the mean centered prediction function for the item-based approach does not include a mean user rating. Additionally, if we were to modify the mean centered prediction function for the item-based approach to include a mean user rating, we would be giving higher weights to ratings made recently, independent of whether the user that gave the rating is similar to the target user or not.

Two configurations of the model are tested, one operating on epoch timestamps and one operating on timestamps that have been converted to days.

3.3.2 Time aware approaches

Altogether, we are using three different ways of incorporating time: giving recent ratings higher relevancy in the prediction phase using the time weight function proposed by Liu et al.[23] (see section 2.5), referred to as the Time weighted prediction approach, modifying the similarity calculations with the same time weight proposed by Liu et al.[23] (see section 2.5), referred to as the Time weighted similarity approach. Additionally, we propose our own im-plementation, referred to as the Time weighted mean user rating approach operating on either epoch timestamps or days (explained in section 3.3.1). These three approaches are combined in different ways, implemented in the previously mentioned time unaware configurations, resulting in a total of twelve different time aware configurations, being the following:

1. Item-based using Pearson Correlation Coefficient, using the Time weighted prediction approach. 2. Item-based using Pearson Correlation Coefficient,

using the Time weighted similarity approach. 3. Item-based using Adjusted Cosine Similarity,

using the Time weighted prediction approach. 4. Item-based using Adjusted Cosine Similarity, using the Time weighted similarity approach. 5. Item-based using Adjusted Cosine Similarity,

using the Time weighted prediction approach and the Time weighted similarity approach.

6. User-based using Pearson Correlation Coefficient/Adjusted Cosine Similarity, using the Time weighted prediction approach.

7. User-based using Pearson Correlation Coefficient/Adjusted Cosine Similarity, using the Time weighted similarity approach.

(23)

3.4. Parameter Optimization

8. User-based using Pearson Correlation Coefficient/Adjusted Cosine Similarity, using the Time weighted mean user rating approach operating on epoch timestamps. 9. User-based using Pearson Correlation Coefficient/Adjusted Cosine Similarity,

using the Time weighted mean user rating approach operating on days. 10. User-based using Pearson Correlation Coefficient/Adjusted Cosine Similarity,

using the Time weighted prediction approach and the Tim weighted similarity approach.

11. User-based using Pearson Correlation Coefficient/Adjusted Cosine Similarity, using the Time weighted prediction approach,

the Time weighted similarity approach

and the Time weighted mean user rating approach operating on epoch timestamps. 12. User-based using Pearson Correlation Coefficient/Adjusted Cosine Similarity,

using the Time weighted prediction approach, the Time weighted similarity approach

and the Time weighted mean user rating approach operating on epoch days.

Due to the nature of the calculations, the time weighted mean user rating approach cannot be implemented for the item-based approaches. All configurations utilize both significance weighting (see section 2.3.1.4) and the mean centered prediction function (see section 2.10).

When developing the time aware approaches, only small changes to the corresponding time unaware approach have to be made depending on what time weighting is being imple-mented. Our time aware approaches can be seen as copies of their respective time unaware system with minor changes, ensuring result consistency in terms of differences in prediction accuracy.

3.4 Parameter Optimization

In order to evaluate the chosen approaches adequately, we determine the optimal param-eters for each approach based on their prediction accuracy. Because of time constrains and the heavy computing power needed to explore all possible parameter combinations, we evaluate the decay parameters with fixed neighborhood sizes and cut off values. For the time unaware approaches, we optimize the neighborhood size (see section 2.3.1.5) k P t10, 20, 30, 40, 50, 60, 70, 80, 90, 100u when constructing neighborhoods and the cut off value (see section 2.3.1.4) C P t10, 20, 30, 40, 50, 60, 70, 80, 90, 100u when computing similarity. After the optimal values for k and C are determined, we test the decay rates for the time aware approaches with the corresponding optimal k and C, i.e. k and C are static for the time aware approaches.

We denote the decay rate parameter λ for the Time aware prediction approach, α for the Time aware similarity approach and θ for the proposed Time weighted mean user rating ap-proach. λ and θ ranges from 0.0 to 0.1, incrementing by 0.001 each iteration and α ranges from 0.0 to 0.01, incrementing by 0.001 each iteration. The prediction phase is not as time consum-ing as computconsum-ing similarity, which is why we are testconsum-ing more values of λ and θ. The ranges for each parameter are determined by evaluating values between 0 and 1, incrementing by 0.1 each iteration.

3.5 Evaluation

We evaluate the recommender systems by withholding the ratings present in the test set. The systems then tries to predict the ratings being withheld from it, after which each predicted value is compared to the actual rating a user gave a specific movies, to estimate the prediction

(24)

3.5. Evaluation

accuracy. We measure this by using the previously mentioned metrics, being Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) (see section 2.6). The metrics are calculated using scikit-learn.

(25)

4 Results

The following sections presents the results produced by the evaluation of the recommender systems. The optimal parameters for the time unaware and time aware approaches are pre-sented in section 4.1. As mentioned in section 3.4, we find the optimal k, C values for the time unaware approaches and use these when optimizing λ, α and θ. Section 4.2 shows the resulting MAE and RMSE for the different approaches when using the optimal parameter values. The increase in accuracy for the different time aware approaches, compared to the corresponding time unaware approach, is also presented in this section.

4.1 Parameter Optimization

This section presents the optimal parameter values for the different approaches, including op-timal k, C values for the time unaware approaches and λ, α, θ for the time aware approaches when using the optimal k, C.

4.1.1 Time unaware approaches

Figures 4.1, 4.2, 4.3, 4.4, 4.5, 4.6 shows the RMSE and MAE for different values of k and C, for the three time unaware approaches:

(26)

Figure 4.1: MAE for the user-based time

un-aware approach.

Figure 4.2: RMSE for the user-based time

unaware approach.

Figure 4.3: MAE for the item-based time

unaware approach using Pearson Correla-tion Coefficient.

Figure 4.4: RMSE for the item-based time

unaware approach using Pearson Correla-tion Coefficient.

(27)

Figure 4.5: MAE for the item-based time

unaware approach using Adjusted Cosine Similarity.

Figure 4.6: RMSE for the item-based time

unaware approach using Adjusted Cosine Similarity.

As expected, varying the parameter values results in different recommendation accuracy. For the user-based approach and item-based approach using Adjust Cosine Similarity, the accu-racy increases as k does. The user-based approach produces the most accurate recommen-dations when k = 100 and C = 50, both when looking at RMSE and MAE. The same optimal values are observed for the item-based approach using Adjusted Cosine Similarity. As seen in figure 4.3 and 4.4, the item-based approach using Pearson Correlation Coefficient produces the highest accuracy when k = 60 and C = 80. Generally, the item-based approaches produces more accurate recommendations compared to the user-based approach, for both similarity metrics. Comparing the two item-based approaches to each other, using Pearson Correlation Coefficient yields lower RMSE than using Adjusted Cosine Similarity while the MAEs are about the same.

It should be noted that, for all configurations, when testing k values larger than 100 we observe a minor increase in accuracy. However, the calculation time greatly increases as k does, making evaluation too time consuming. Because of this, we keep the range k P t10, 20, 30, 40, 50, 60, 70, 80, 90, 100u.

As mentioned, in further evaluation we fix k and C to their corresponding optimal values:

Type Similarity k C

Item-based Adj. Cosine 100 50

Item-based Pearson 60 80

User-based Adj. Cosine/Pearson 100 50

Table 4.1: Optimal k and C for time unaware approaches.

4.1.2 Time aware Models

The following sections presents the evaluation results of the mentioned time aware ap-proaches, including the optimal values for the decay rate parameters λ, α and θ.

(28)

4.1.2.1 Time weighted prediction approach

Figure 4.7 and 4.8 depicts the prediction accuracy for the Time weighted prediction approach in terms of RMSE and MAE, for different values of the decay rate parameter λ. Note that time is disregarded when λ = 0.

Figure 4.7: RMSE and MAE for different

decay values λ for the item-based Time weighted prediction approaches.

decay values λ for the user-based Time weighted prediction approach.

The results show that the value of λ does indeed affect the prediction accuracy, yielding more accurate recommendations when weighting the ratings of the neighbors in the prediction calculation phase. Increased accuracy is observed for all configurations, although minor. For the item-based approaches, using Pearson Correlation Coefficient as the similarity metric seems to make the model more sensitive to the value of λ, compared to using Adjusted Cosine Similarity.

4.1.2.2 Time weighted similarity approach

Figure 4.9 and 4.10 shows the RMSE and MAE when varying the decay rate parameter α for the Time weighted similarity approaches.

decay values α for the item-based Time weighted similarity approaches.

Figure 4.10: RMSE and MAE for

differ-ent decay values α for the user-based Time weighted similarity approaches.

The item-based approach using Pearson Correlation Coefficient produces the lowest errors when α = 0, i.e. when the similarities are not time weighted. Because of this, we do not

(29)

4.2. Comparison of time aware and time unaware approaches

evaluate an item-based approach using Pearson Correlation Coefficient incorporating both

λand α (Time weighted hybrid approach). The resulting errors would be the same as the

corresponding time weighted prediction approach. The remaining approaches do show an increase in accuracy, however not significant.

4.1.2.3 Time weighted mean user rating approach

Figures 4.11 and 4.12 shows the RMSE and MAE for our proposed Time weighted mean user rating approach when varying the decay rate parameter θ and using the two different configurations presented in 3.3.1.

differ-ent decay values θ for the user-based Time weighted mean user rating approach oper-ating on epoch timestamps.

differ-ent decay values θ for the user-based Time weighted mean user rating approach oper-ating on days.

The two configurations yield different results. When operating on epoch timestamps the increase in prediction accuracy is remarkable, not only in comparison to the time unaware equivalent, but also compared to the entire set of different approaches. The accuracy increases when operating on days as well, although not as significantly. These results are discussed in section 5.1.1.

4.2 Comparison of time aware and time unaware approaches

In the following chapter, the evaluation and following results of the different approaches are presented, configured with their optimal parameter values.

4.2.1 Approaches

Tables 4.2, 4.3, 4.4, 4.5, 4.6 presents the approaches with their optimal parameter values and their corresponding RMSE and MAE.

Note that k refers to the neighborhood size, C to the cut off value, λ to the decay rate parameter of the Time weighted prediction approach, α to the decay rate parameter of the Time weighted similarity approach and θ to the decay rate parameter of the Time weighted mean user rating approach. For the user-based approaches utilizing θ, the types ’(epoch)’ and ’(days)’ refers to if epoch timestamps or days are used when calculating the time weighted mean user rating.

For comparison purposes, we include the RMSE and MAE when only predicting the mean rating of a user/item.

(30)

Time unaware

Type Similarity k C λ α θ RMSE MAE

Item-based Adj. Cosine 100 50 0.0 0.0 0.0 1.0434 0.8048 Item-based Pearson 60 80 0.0 0.0 0.0 1.0313 0.8072 User-based Adj. Cos./Pea. 100 50 0.0 0.0 0.0 1.0509 0.8240 User Mean N/A N/A N/A N/A N/A N/A 1.1531 0.9157 Item Mean N/A N/A N/A N/A N/A N/A 1.1071 0.8781

Table 4.2: RMSE and MAE for time unaware approaches using the corresponding optimal

parameter values.

Time weighted prediction

Item-based Adj. Cosine 100 50 0.044 0.0 0.0 1.0383 0.7998 Item-based Pearson 60 80 0.013 0.0 0.0 1.0292 0.8066 User-based Adj. Cos./Pea. 100 50 0.009 0.0 0.0 1.0473 0.8215

Table 4.3: RMSE and MAE for the Time weighted prediction approaches using the

corre-sponding optimal parameter values.

Time weighted similarity

Item-based Adj. Cosine 100 50 0.0 0.001 0.0 1.0423 0.8044 User-based Adj. Cos./Pea. 100 50 0.0 0.007 0.0 1.0499 0.8215

Table 4.4: RMSE and MAE for the Time weighted similarity approaches using the

corre-sponding optimal parameter values.

Time weighted mean user rating

User-based (epoch) Adj. Cos./Pea. 100 50 0.0 0.0 0.008 1.0408 0.7874 User-based (days) Adj. Cos./Pea. 100 50 0.0 0.0 0.024 1.0449 0.8213

Table 4.5: RMSE and MAE for the Time weighted mean user rating approaches using the

(31)

Time weighted hybrid

Item-based Adj. Cosine 100 50 0.044 0.001 0.0 1.0373 0.7993 User-based Adj. Cos./Pea. 100 50 0.009 0.007 0.0 1.0465 0.8198 User-based (epoch) Adj. Cos./Pea. 100 50 0.009 0.007 0.008 1.0343 0.7821 User-based (days) Adj. Cos./Pea. 100 50 0.009 0.007 0.024 1.0410 0.8172

Table 4.6: RMSE and MAE for the Time weighted hybrid approaches using the

correspond-ing optimal parameter values.

4.2.1.1 Increase in accuracy

The following section presents the decrease in RMSE and MAE (%) the time aware ap-proaches result in when compared to their time unaware equivalents.

Item-based Adjusted Cosine Similarity

Item-based Adj. Cosine 100 50 0.044 0.0 0.0 0.49% 0.63% Item-based Adj. Cosine 100 50 0.0 0.001 0.0 0.10% 0.10% Item-based Adj. Cosine 100 50 0.044 0.001 0.0 0.60% 0.69%

Table 4.7: Decrease in RMSE and MAE (%) for the time weighted item-based approaches

using Adjusted Cosine Similarity.

Item-based Pearson Correlation Coefficient

Item-based Pearson 60 80 0.013 0.0 0.0 0.20% 0.08%

Table 4.8: Decrease in RMSE and MAE (%) for the time weighted item-based approaches

(32)

User-based Adjusted Cosine Similarity/Pearson Correlation Coefficient

User-based Adj. Cos./Pea. 100 50 0.009 0.0 0.0 0.33% 0.32% User-based Adj. Cos./Pea. 100 50 0.0 0.007 0.0 0.09% 0.31% User-based (epoch) Adj. Cos./Pea. 100 50 0.0 0.0 0.008 0.96% 4.65% User-based (days) Adj. Cos./Pea. 100 50 0.0 0.0 0.024 0.47% 0.34% User-based Adj. Cos./Pea. 100 50 0.009 0.007 0.0 0.42% 0.51% User-based (epoch) Adj. Cos./Pea. 100 50 0.009 0.007 0.008 1.60% 5.37% User-based (days) Adj. Cos./Pea. 100 50 0.009 0.007 0.024 0.95% 0.83%

Table 4.9: Decrease in RMSE and MAE (%) for the time weighted user-based approaches

using Pearson Correlation Coefficient/Adjusted Cosine Similarity.

Comparison between time unaware and time aware variations Type Similarity Time weight Time unaware

(RMSE) Time aware (RMSE) Time unaware (MAE) Time aware (MAE) % (RMSE) % (MAE) Item-based Adj. Cosine λ 1.0434 1.0383 0.8048 0.7998 0.49% 0.63% Item-based Adj. Cosine α 1.0434 1.0423 0.8048 0.8044 0.10% 0.10% Item-based Adj. Cosine λ, α 1.0434 1.0373 0.8048 0.7993 0.60% 0.69% Item-based Pearson λ 1.0313 1.0292 0.8072 0.8066 0.20% 0.08%

User-based Adj. Cos./Pea. λ 1.0509 1.0473 0.8240 0.8215 0.33% 0.32% User-based Adj. Cos./Pea. α 1.0509 1.0499 0.8240 0.8215 0.09% 0.31%

User-based Adj. Cos./Pea. θ(e poch) 1.0509 1.0408 0.8240 0.7874 0.96% 4.65% User-based Adj. Cos./Pea. θ(days) 1.0509 1.0449 0.8240 0.8213 0.47% 0.34%

User-based Adj. Cos./Pea. λ, α 1.0509 1.0465 0.8240 0.8198 0.42% 0.51% User-based Adj. Cos./Pea. λ, α, θ(epoch) 1.0509 1.0343 0.8240 0.7821 1.60% 5.37%

User-based Adj. Cos./Pea. λ, α, θ(days) 1.0509 1.0410 0.8240 0.8172 0.95% 0.83% Table 4.10: Summary of RMSE and MAE for different configurations and approaches.

As previously explained, the prediction calculation of the item-based approaches does not include a mean user rating, hence there are no item-based results including θ. Additionally, the item-based approach using Pearson Correlation Coefficient shows no increase in accuracy when increasing α (as shown in figure 4.9), resulting in λ being the only parameter having an effect on this approach.

What is relevant in this context is not the RMSEs and MAEs for the different approaches by themselves, rather the increase in accuracy presented in section 4.2.1.1. For the item-based approaches, the highest increase in accuracy is observed when using the hybrid approach implemented with Adjusted Cosine Similarity. The item-based approach using Pearson Cor-relation Coefficient shows the lowest overall increase in accuracy, but is at the same time only tested with λ.

For the user-based approach, using α independently has the lowest effect on the accuracy, showing almost no increase when measuring RMSE. We observe a major increase in accu-racy when using the Time weighted mean user rating approach operating on epoch times-tamps, both when using the approach independently and when implemented in the hybrid approach. The approach operating on days also increases accuracy, although minor in com-parison (further discussed in section 5.1.1). Comparing the user-based hybrid approach not using θ to the approaches utilizing it, we observe an increase of about 1.2% when operating on epoch timestamps and 0.52% when operating on days.

(33)

5 Discussion

In the following sections we discuss the work conducted in this study. Firstly, the results produced are discussed from a critical point of view, where we analyze the general results and the different approaches. Secondly, we discuss the method chosen and point out areas where different implementations could have been made. Subsequently, the work is put in a wider context, assessing what effect increased prediction accuracy could have in real life scenarios, as well as ethical and societal aspects of the work. Lastly, we present some areas where further testing would be necessary to strengthen the results produced in this study. Additional future work is also presented.

5.1 Results

Our results show that incorporating temporal properties to increase prediction accuracy is a valid approach, given the stated limitations. Eleven out of twelve time aware approaches show an increase in accuracy, the item-based Time weighted similarity approach using Pear-son Correlation Coefficient being the one resulting in a decrease. Although the increase is minor, near insignificant in most cases, the proposed Time weighted mean user rating ap-proach operating on days by itself shows a decrease in RMSE of near 0.47% and the hybrid approach a decrease of 0.95%, compared to its time unaware equivalents. Our results can be seen as an initial indication, where further evaluation and in-depth testing would be neces-sary to establish the results.

When comparing our results with other studies implementing some sort of time weight, we observe that our results does not generally show as promising results as the other stud-ies [19][6][22][23]. This can be explained by our approaches not being as sophisticated as the other approaches tested in previous studies and that our evaluation process differed. However, Zwart [25] evaluated different time weighting approaches similar to the ones pre-sented in this study with a similar evaluation process. Zwart evaluated the time weighted approaches on the MovieLens 1M dataset, having similar characteristics as the one used in this study. The corresponding increase in accuracy he observed, are similar to those ob-tained from us [25]. Small increases in accuracy were generally observed for the different approaches, which lead Zwart to claim that utilizing time to increase the recommendation accuracy is not a solid approach [25]. We still believe that the hypothesis itself is valid and

(34)

5.2. Method

could potentially result in even higher increases in accuracy than what was obtained in this study. Further studies would be necessary to prove this.

It is necessary to highlight the fact that the results of our study are only based on the MovieLens 100k dataset. Testing the different approaches on more and different data would be desirable. We cannot ensure that the results would be the same for a different dataset, in a positive or negative way (especially the optimal decay variables which are highly data dependent). An example would be to test the approaches with data produced over a longer period of time, preferably multiple years, which we believe would fit the hypothesis of user behaviour changing over time more so than the data used in this study. With that said, the dataset we use is frequently reoccurring in research regarding recommender systems incor-porating temporal properties, hence our decision of using it.

5.1.1 Time weighted mean user rating

The results showing the accuracy when using the proposed Time weighted mean user rat-ing approaches are quite remarkable. Operatrat-ing on epoch timestamps provides the lowest RMSE and MAE, while the approach converting timestamps to difference in days almost has no effect on the accuracy at all. When changing the other time aware approaches to operate on epoch timestamps, we observe decreased accuracy for all approaches. We suspect this be-haviour is associated with the dataset, where some users might have given all of their ratings during a single day. The Time weighted mean user rating approach is the only one where the independent rating timeline of a single user is relevant, i.e. the only approach where the time when a user made a rating is compared to rating times of the same user. Operating on days for users where all of the ratings are given on a single day would obviously have no effect, every rating would simply be given the same time weight, i.e. the mean rating of a user. In addition, we argue it is not reasonable to assume rating behaviour changes in a matter of seconds, more so over a longer period of time. Operating on epoch timestamps contradicts this statement, where differences in seconds will have an effect on the time weight.

Because of this, we can not ensure that the results of the Time weighted mean user rating approach operating on epoch timestamps are valid. With that said, operating on days did provide a slight increase in accuracy, outperforming all the other user-based time aware ap-proaches operating on days. Further evaluation and in-depth testing is necessary to prove the validity of the approach.

5.2 Method

In the following chapter we discuss the applied method, including its possible weaknesses and limitations. Firstly, we discuss the pre-study and the credibility of our sources. Secondly, we discuss how we implemented the different approaches. Lastly, we discuss the evaluation process, suggesting how it could be further improved.

5.2.1 Pre-study

During the pre-study we researched prior work studying time aware recommender systems and their potential increase in accuracy. The purpose of the pre-study was to confirm our hypothesis that including temporal information when recommending items to users could potentially increase accuracy. We found several peer reviewed scientific papers supporting our hypothesis ([19], [6], [22], [23], [18]). The papers are fairly recent, where the oldest paper used was published in 2005 and the most recent was published in 2016. All the papers have been published in conferences, giving us a strong indicator of credibility. The majority of sources used in the theory section (2) are scientific published papers or books. We view these sources as reliable.

Comparison and improvement of time aware collaborative filtering techniques : Recommender systems

Linköping University | Department of Computer and Information Science

Bachelor’s thesis, 16 ECTS | Datateknik

202019 | LIU-IDA/LITH-EX-G--2019/023--SE

Comparison and improvement of

time aware collaborative ﬁltering

techniques

Recommender systems

Jämförelsestudie och förbättring av tidsmedvetna collaborative

ﬁltering tekniker

Otto Denesfay

David Grönberg

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Motivation

1.2

Time-based collaborative filtering recommender system

1.3

Cybercom

1.4

Aim

1.5

Research questions

1.6

Delimitations

2

Theory

2.1

Ratings

2.2

Content-based filtering

2.3

Collaborative filtering

2.3.1

Memory-based

2.3.2

Model-based

2.4

Hybrid

2.5

Time aware collaborative filtering

2.6

Evaluation

3

Method

3.1

Data

3.1.1

Train- and test set

3.2

Software

3.3

Approaches

3.3.1

Proposed time weighted mean user rating approach

3.3.2

Time aware approaches

3.4

Parameter Optimization

3.5

Evaluation

4

Results

4.1

Parameter Optimization

4.1.1

Time unaware approaches

4.1.2

Time aware Models

4.2

Comparison of time aware and time unaware approaches

4.2.1

Approaches