• No results found

A Recommendation system for News Push Notifications

N/A
N/A
Protected

Academic year: 2021

Share "A Recommendation system for News Push Notifications"

Copied!
53
0
0

Loading.... (view fulltext now)

Full text

(1)

A Recommendation system for News

Push Notifications

- Personalizing with a User-based and Content-based Recommendation system

Ida Wiklund

Ida Wiklund

Spring 2020

Degree Project in Interaction Technology and Design, 30 credits Supervisor: Kai-Florian Richter

(2)

The news landscape has changed during recent years because of the dig-itization. News can nowadays be found in both newspapers and on dif-ferent sites online. The availability of the digital newspapers leads to competition among the news companies. To make the users stay on one specific platform for news, relevance is required in the content and one way of creating relevance is through personalization, to tailor the con-tent to each user. The focus of this thesis is therefore personalizing news push notifications for a digital newspaper and making them more rele-vant for users. The project was made in cooperation with VK Media, and their digital newspaper. The task in this thesis is to implement per-sonalization of push notifications by building a recommendation system and to test the implemented system with data from VK.

In order to perform the task, a dataset representing reading habits of VK’s users was extracted from their data warehouse. Then a user-based and content-based recommendation system was implemented in Python. The idea with the system is to recommend new articles that are suffi-ciently similar to one or more of the already read articles. Articles that may be liked by one of the most similar users should also be recom-mended. Finally, the system’s performance was evaluated with the data representing reading habits for VK’s users. The results show that the implemented system has better performance than the current solution without any personalization, when recommending a few articles to each user. The results from the evaluation also show that the more articles the users have read, the better predictions are possible to make. Thus, this thesis offers a first step towards meeting the expectations of more relevant content among VK’s users.

(3)

Acknowledgments

(4)

Contents

Abstract I

Acknowledgments II

1 Introduction 1

1.1 Objectives 2

1.2 The final implemented Recommendation system 3

1.3 Thesis outline 3

2 Theoretical Background 4

2.1 V¨asterbottens-Kuriren 4

2.1.1 V¨asterbottens-Kuriren’s news application 5 2.1.2 The news application Folkbladet 7

2.2 What is Personalization? 8

2.3 What is Customization? 8

2.4 Relevance 8

2.5 The Filter Bubble 9

2.6 Push notifications 9

2.6.1 Designing push notifications 10

2.7 Users and Personalization 10

2.8 Users and VK 11

2.9 Recommendation systems 12

2.9.1 Demographic filtering Recommendation system 12 2.9.2 Content-based filtering Recommendation system 12 2.9.3 Collaborative filtering Recommendation system 13

2.9.4 Hybrid Recommendation system 15

(5)

2.11 Previously created Recommendation systems 17 2.12 Evaluations of Recommendation systems 18

3 Method 20

3.1 Data extraction 21

3.2 Implementation 22

3.2.1 Content-based Recommendation system 22 3.2.2 User-based Recommendation system 25

3.3 Evaluation methods 26

3.3.1 Evaluation of the Content-based System 27 3.3.2 Evaluation of the User-based part of the system 28

4 Results 29

4.1 Results from the Content-based Evaluation 29 4.2 Results from the User-based Evaluation 33

5 Discussion 36

5.1 The design of the Recommendation system 36

5.2 The Content-based system 38

5.3 The User-based part of the system 40

5.4 Limitations 42

6 Conclusion 43

6.1 Future work 44

(6)

Introduction

About 2000 years ago the Roman Empire already had a precursor to today’s newspapers [1]. Their slaves were used to make copies of messages about important events which were then sold to well-ordered Romans. The German Johann Gutenberg (1397-1468) and his efforts in the art of letterpress made it possible to create regularly published publications. The first Swedish newspaper was published in 1645 and was called “Ordinari Post Tijdender”. It was used to spread state propaganda messages. The newspaper “Aftonbladet” was the first Swedish newspaper to go online in 1994 and since then most newspapers have pub-lished large parts of their printed newspaper online [2]. Currently, the digital version of any newspaper in Sweden has more consumers than their paper versions [3].

The new digitized newspaper brings new opportunities but also new challenges. The avail-ability of digital newspapers makes it easy for users to read many different newspapers if they prefer. The digitized newspapers therefore have much more competition now, than only traditional newspapers had before. According to an investigation about VK and their readers, most of the news consumers who are subscribing to one digital newspaper would choose another news source as their primary news source [4]. The younger generations were to a greater extent less loyal towards one specific news brand. This indicates that the digital news consumers read more than one newspaper and that the newspaper they subscribe to did not provide enough interesting content to keep the user interested. Which in turn means that these different newspapers are competing against each other to get the attention of the user.

What is crucial to make the users consume, like and stay on one specific platform for news, is relevance [4]. With relevant available and recommended articles, the users want to read more and more. Hence, providing relevant content creates habits for the users. This in turn, creates loyalty from the users to that specific news brand, which can be crucial to ensure the future of the news publishers. One way of creating relevance is through personalization, to tailor the content to each user of the newspaper. This could be done by personalizing push notifications. In fact, push notifications containing some tailored content have a four times higher response rate than non-customized push notifications [4].

(7)

what each user wants to read. The project is made in cooperation with VK Media, and their digital newspaper. The work of making the newspaper relevant and personalizing started in September 2019 with another master’s thesis in Interaction Design by Amine Balta [4]. She investigated how the current notification system for VK works, how to personalize a newspaper and how to make the content relevant for each user. This thesis is based on her findings.

There have been earlier studies about recommendation systems of different kinds and binations. For example, a study about item-based collaborative filtering systems that com-pares different techniques and their results [5]. There are also a lot of recommendation systems specifically made for news recommendations. For example, a hybrid recommen-dation system to predict both the interests of each user with content-based filtering and the current news trend with collaborative filtering [6]. They found that the users’ news interests vary over time but follow the trend of news events. They only used a few very general topic categories, instead of fine-grained topics, to make predictions from. Another study investi-gates news recommendation systems based on context trees [7]. They use more topics and generate a context tree with these different topics. They found that 50 topics would bring the best accuracy for their system.

But there is no news recommendation system based on the specific data from VK. Their data set contain more than 11 000 unclustered machine learning generated topics. The topics are not ordered in an ontology or hierarchy and are therefore hard to navigate through. This large number of topics makes it challenging to find what each user really likes. Another parameter that can be considered and would make this project even more exclusive is VK’s own subjectively estimated “news value”. This is a value for how relevant a news article might be generally, set by the author of the article. Considering this value would potentially give more relevant recommendations.

1.1 Objectives

The task of this project is to give further recommendations on how to implement a alization system for push notifications, with help from a set of data from VK, and person-alizing recommendations from a previous master’s thesis. The final goal is to give VK’s subscribing users more relevant recommended articles with push notifications. This goal is achieved by using users’ behavioral data to predict what each user wants to read and then recommend these articles. This in order to make their users like and stay on their sites. Thus, the task to perform in this project is to implement personalization of push notifica-tions by building a recommendation system and to test the implemented system with data from VK.

All VK’s articles are automatically tagged by a machine learning algorithm which picks out the most important concept tags, which should represent the article content. All articles also get a news value which is an integer between 1 - 6 that determines how valuable and interesting an article is in general when it comes to news. Since VK uses this additional data to annotate their articles, the following two questions are also investigated:

(8)

1.2 The final implemented Recommendation system

The final created personalization system functions as a filter for new articles. It is designed to send less frequently and more relevant and important push notifications to each user. If a user has read one or more articles similar enough to a new one, then the new article should be pushed. Or if one of the most similar users may like the article, it should also get pushed. The system was created as a hybrid recommendation system, between user-based collaborative filtering and content-based filtering. The user-based collaborative filtering part of the system uses concept tags and read articles to find similar users and to recommend articles similar users may like. The content-based part of the system uses titles for articles and concept tags to find the interests of a specific user.

1.3 Thesis outline

The thesis is organized as follows:

Chapter 2 - Theoretical Background This chapter first presents information about VK Media and their application. Then the chapter explains personalization, customiza-tion, relevance and other concepts. Descriptions of how users generally expect and want their newspaper to be personalized can also be found here. Finally, the chapter presents different kinds of recommendation systems, techniques for vector represen-tation and finding similarity, followed by different ways for evaluating a recommen-dation system.

Chapter 3 - Method This chapter treats the methodology of the project. The method for this project is divided into three parts. The first part is to determine which person-alization methods are best suited in the context of VK’s newspaper, the second part consists of extracting a dataset from VK’s data and implementing a system with the data and the final part consists of testing and evaluating the output recommendations. The latter two parts of the implementation are described in the chapter.

Chapter 4 - Results This chapter presents the results from the evaluation described in the previous chapter. This chapter is divided into testing the content-based part and the user-based part of the recommendation system. The results from the content-based part and some of the user-based results are presented with values generated from the tests and with some calculated metrics. The rest of the user-based test results are presented with example evaluations.

Chapter 5 - Discussion This chapter discusses the created recommendation system and the results from the evaluation. The underlying design of the system is explained and discussed with respect to previous studies. The results from the evaluation are analyzed and the performance of the system with different settings is discussed. The user-based and the content-based part of the system are mainly analyzed separately. Chapter 6 - Conclusion and future work This chapter gives a closing remark on the

(9)

Theoretical Background

This chapter dives into terminologies and explanations of concepts surrounding personal-ization and recommendation systems. Firstly, the chapter presents information about VK Media, their policy and their personalization strategy. The current solution for push noti-fications in the application is also presented. Further, the chapter explains personalization, customization, relevance and other concepts. Descriptions of how users generally expect and want their newspaper to be personalized can also be found here. Finally, the chapter presents different kinds of recommendation systems, techniques for vector representation and finding similarity, followed by different ways for evaluating a recommendation system.

2.1 V¨asterbottens-Kuriren

V¨asterbottens-Kuriren (VK) is a newspaper that publishes in Ume˚a, Sweden [4]. The news-paper covers the major part of V¨asterbotten, but not the northern parts. The keywords for the company are openness, democracy and freedom of speech. VK Media was founded in 1900 and they are now in the midst of their digital transformation. Their goal is to turn their traditional publishing house to a full functioning multimedia organization. VK Media de-liver physical newspapers six days a week, they launched their first website in 1997, and the first version of their mobile application in 2014. Recently they scaled up their development operations team, this section of the company is named VK Media Next. VK Media Next is their innovation and development team. They provide all the digital services, products and platforms for the whole media group.

(10)

Figure 1: V¨asterbottens-Kuriren’s push notification page for selecting categories to follow, and handling tags. To the left are some of the different push categories and to the right, at the bottom of the page, is “My VK”.

2.1.1 V¨asterbottens-Kuriren’s news application

(11)

Figure 2: This Figure is displaying a case scenario of following a tag for an article. The page on the left shows at the bottom of an article where some tags for the article are listed. In the middle is the page displayed after pressing “follow” for the tag “Coronakrisen”, where the user can choose format for the push notifications. To the right is the page were the user can see articles involving that tag [4]. The user gets to this page if he presses the blue link “h¨ar” in the page to the right in Figure 1.

Last but not least, there is something called “My VK” under the menu alternative “Push notifications”. My VK contains over 11 000 metadata tags the users can follow. Users can only manage the different concept tags here, they need to find each tag in the end of an article to follow it, as shown to the left in Figure 2. There are no other possible ways to follow specific tags. The users can then select the format for receiving the notifications for articles involving the tag, and find all articles for the tag, see the two remaining pages in Figure 2. The tags consist of different specific categories, for example “Festivals”, “Drugs”, “Zara Larsson” and the author of the news article. All articles are automatically tagged by a machine learning algorithm which picks out the most important tags depending on the content of the article. The users can choose to follow these tags at the end of different articles and would then be notified if any other article gets tagged with the same tag. The users can also choose to receive these notifications on their phones, as messages via email or to receive a collective mail of all articles with the specific tag on Sundays.

(12)

categories, the handling of push notifications, how the user should handle push notifications and how the notifications become more relevant and smarter should be reviewed. This could be done by reviewing or using VK’s data. VK has collected data about their users, their articles and about how each user interacts with their applications. This data has been collected in a consistent way since the 1st of June 2019.

2.1.2 The news application Folkbladet

VK Media also publishes another newspaper called “Folkbladet” and it publishes in the whole of V¨asterbotten. This newspaper has a mobile application similar to the mobile ap-plication “V¨asterbottens-Kuriren”. The apap-plication is called “Folkbladet” and has the same system for categories, concept tags and push notifications as V¨asterbottens-Kuriren but with other categories and location categories. The categories in this application are “News”, “Sports” and “Culture & Entertainment” and it has 15 location categories. This applica-tion has the exact same system for tags as VK, but Folkbladet calls it “My Folkbladet” instead. See the page for push notification settings in this application in Figure 3. The data collected from Folkbladet and its users is in the same format and at the same place as for the application V¨asterbottens-Kuriren and its users. When referring to VK’s data, data from both applications are therefore considered but the main focus is still the application “V¨asterbottens-Kuriren”.

(13)

2.2 What is Personalization?

Personalization is defined by Nielsen Norman Group as “Personalization is done by the sys-tem being used”[10]. They describe it as a system designed to deliver content, experience, or functionality that matches the preferences of the users. The main goal of personalization is to provide functionality and content that matches specific needs or interests, with no effort from the users. The personalization system creates a user profile for each user and adjusts the content and interface according to the profile. Personalization can for example deliver specific information or remember information about a user and hence simplify transactions and other processes.

There are two types of personalization, which are role-based and individualized personal-ization [10]. Role-based personalpersonal-ization means that users are grouped together according to certain well-defined characteristics (e.g., based on location or a certain role). Individualized personalization means that the system creates a profile of each user and presents different things for each user (e.g., based on past browsing history). The positive effect of personal-ization is an improved user experience on the site that does not require any effort from the users. The disadvantage is that the system only guesses what each user needs and may be wrong. Some users may also feel that it is unnerving that the system can make guesses and hence feel supervised.

2.3 What is Customization?

Customization is defined by Nielsen Norman Group as “Customization is done by the user”[10]. Customization means that the system allows users to customize or change their experience by configuring content, layout, or system functionality. Customization may for example involve moving items on the interface, selecting topics of interest, or altering col-ors. The main goal for customization is to let users make selections about what they want, or to allow them to set their preferences and adapt the interface accordingly. Customization may enhance the user experience as it puts the users in control. The positive effect of cus-tomization is that the users are in control and hence can get exactly what they want. The negative effect comes from that many users do not know what they need or want, or that they are not willing to carefully select their customization settings. This causes the output for the customization to be inaccurate and the interface to not match the user’s preferences.

2.4 Relevance

(14)

as they found that the engagement of the user may be influenced by the brand. The last aspect they found relevant was the news story topic. This is the paramount determinant of news engagement.

2.5 The Filter Bubble

According to the Cambridge dictionary, the definition of a filter bubble is “a situation in which someone only hears or sees news and information that supports what they already believe and like, especially a situation created on the internet as a result of algorithms that choose the results of someone’s searches” [13]. Users consuming services online tend to put themselves into the filter bubble when consuming and interacting with content that speaks for their own beliefs, which can be one reason for an increasing online polarisation and segregation [4, 14]. The filter bubble becomes a unique universe of information for each user where they do not decide what gets in or what gets edited out. It becomes a circle where users shape their media by consuming content they like, and media adapting their content and hence shaping the users. Among the users of VK’s digital newspaper, there is a fear of exclusion of important news and the unknown factors of filtering and the filter bubble [4].

2.6 Push notifications

Push notifications are messages from an application, and they can be displayed in many formats. They can for example be seen on the lock screen of a phone or tablet, within a website or on a computer. Mobile push notifications can provide relevant information to the users and engage them with an application, when properly implemented [15]. However, poorly implemented push notifications can deter users from engagement and even make them dislike and delete the application. A study about the usage of push notifications was conducted in the beginning of 2018 with over 50 billion push notifications sent to 900 million mobile users worldwide [16]. 91.1 per cent of the Android users and 43.9 per cent of the iOS users in the study have push notifications turned on. This difference can be explained with the fact that the iOS users must manually accept push notifications, while on Android the notifications are turned on by default.

Users clicking a push notification have different reaction rates depending on the day and the time of the day, according to the study about push notifications and users [16]. There is a smaller peak in reaction rate during lunch, between 12 pm and 2 pm. But the users are even more active between 9 pm and 12 am. The reaction rate for a notification increases with 20 per cent when emojis are included, and with 25 per cent when the notification is in rich format [4]. Rich pushes are pushes that contain emojis, audio, video, or pictures. The reaction rate increases four times when the notification is personalized and three times when the notification uses advanced targeting (targeting users based on their demographics and other user data). The average click rate for push notifications is 7.8 per cent (4.9 per cent on iOS and 10.7 per cent on Android) [16].

(15)

while the rest to turn them off. Then they monitored the usage of each user in two weeks. They found that 27 per cent of the users with notifications enabled visited the application daily while 12 per cent visited the application daily of those with notifications disabled. 58 per cent of the users opened the push notifications and 32 per cent said that they did not like certain content-related things in the push notifications. When they asked the users about what they dislike about push notifications, they often answered the frequency of notifica-tions and the untailored content.

Then they asked the participants questions about current news to see if they had acquired new knowledge due to the push notifications [4, 17]. Users who received notifications from CNN responded correctly on more questions than people who had them turned off. But there was no difference in the news knowledge between users who received BuzzFeed News notifications and those who did not. This may indicate that the design and the formulations of the notifications affect the knowledge received.

2.6.1 Designing push notifications

The Nielsen Norman Group have made five rules to follow when designing push notifica-tions [15]. The first rule is to let the users explore the application before asking them to enable push notifications. The second rule is to tell the users what notifications to receive. To inform the user about for example the number of pushes, type of pushes, content and time to receive notifications, in order to increase the transparency towards the user. The next rule is to not send multiple repeating notifications. They recommend sending fewer and more meaningful notifications, for example by merging different notifications into one. The fourth rule is to never send notifications containing irrelevant content. Because irrele-vant pushes might interrupt and annoy the user. The last rule is to make it easy to turn off the push notifications, otherwise the company may lose credibility.

2.7 Users and Personalization

Each year “Internetstiftelsen” investigates how the Swedish population uses information and communication technology and how this affects individuals, families and the society as a whole. The investigation shows that the digital versions of all newspapers have more readers than the paper versions today [3]. Compared to digital movie and music services, the digital newspaper subscriptions are at a relatively low level. The younger users are generally more willing to pay for music and video subscriptions on the Internet.

(16)

“Internetstiftelsen” also found that the total amount of newspaper subscriptions, paper and/or digital, is 48 per cent among the Internet users and it decreases for each year [3]. The paper part, mainly consumed by the elderly, accounts for the reduction. The digital subscriptions increase slightly for each year, but the number of digital readers decreases. More than a third of the Internet consumers read daily newspapers online every day, which is slightly fewer than before. This may be a sign of people consuming news in other ways than via daily newspapers, for example on social media.

2.8 Users and VK

During autumn 2019 a study about perzonalization, customization and news consumption for VK and other newspapers was conducted. This study led to a design proposal for the push notification settings page on VK’s news application [4]. The underlying purpose for the study was to increase the relevance of news and regain loyalty from news consumers with personalization. The study consisted of conducting literature studies, handing out surveys to 62 people and conducting interviews with four of VK’s readers. The results showed that users are unhappy with the news climate today. Thus, the expectations on the news sites are not met.

The news consumers today are mostly using mobile phones and computers when reading news [4]. Users in all ages consume news via their phones. Most of the survey participants read news on newspaper websites, followed by newspaper applications and reading on social media. An interviewee used push notifications as the primary news source. Both young and old users are using social media for news today. The major part of the social services online are personalized, the younger generations are therefore expecting a personalized experience for all kinds of news. 86.2 per cent of the survey participants read news at least once a day. The majority of the survey participants answered that personalization could be useful in news applications and were positive about it [4]. They generally thought that a little per-sonalization could be good, but not too much, they want a balance. They would want to be ensured that they would still get the important news. Many users mentioned that they want insight into what was personalized and how - they want transparency. But at the same time the slight majority answered that they would not read more news if they were tailored to them. There is a fear of exclusion of important news and the unknown factors of filtering and the filter bubble among the readers. The general major expectations of a newspaper application or website are objectivity, easiness and convenience to access, a clear interface and relevant content.

The users of VK also fear censorship in personalization and not being in control of what to consume [4]. A well-balanced personalization system can also expose new topics for people and evoke an interest for something else. Balance and transparency were the keywords that seemed important to the users of VK. To sum up, VK’s users were quite positive to the idea of personalized news, but there were some hesitations. They have a fear of losing control of what they consume, a fear of the filter bubble and censorship, but they still want to get rid of irrelevant content.

(17)

instantly. All interviewees also thought that the notification frequency was too high and that they were a bit confusing. The confusion comes from the formulations of the pushes. They also mentioned the annoyance of the phone taking their attention for a reason they did not feel was good enough. The attitude seems to be that push notifications need to be important and relevant to not be annoying. The keywords they mentioned when it comes to content for push notifications are importance, interest and relevance. Their demands and expectations of transparency and clearness were also high.

The conclusion for the study was that, due to the expressed concerns, the concept of per-sonalization should be carefully implemented into the digital newspaper, as VK has a social responsibility to deliver news with a purpose [4]. VK’s users need to feel safe with them, especially as their users express fears towards personalization and the filter bubble. Hence, transparency is of utter importance. But the problems of the high frequency of push notifi-cations and irrelevant content still need to be reviewed.

2.9 Recommendation systems

A recommendation system is an algorithm to provide the most relevant information to a user by discovering and using patterns in a dataset [18]. This algorithm then rates all the items and recommends the items that would fit the user the most, based on their earlier behavioral data. In order to make a functional recommendation system it is important to select the most appropriate information for the decisions [19]. There are three approaches for information filtering, all based on user profile-item and user profile matching techniques. These three are demographic, content-based and collaborative filtering.

2.9.1 Demographic filtering Recommendation system

Demographic filtering methods use user descriptions to learn the relationship between dif-ferent types of users and items [19]. Personal data is used to classify the users into profiles with stereotypical descriptions. The classifications then serve as general characterizations for the users and their interests. These methods have two principal shortcomings. The first one is that the recommendations are too general. Every user has different interests and this kind of method does not take that into consideration. The other shortcoming is that the demographic methods do not adapt to interest changes. The interest of a user tend to shift over time, but these methods cannot adapt to it.

2.9.2 Content-based filtering Recommendation system

(18)

Figure 4: A Figure of the overall principle of content-based filtering. A very general de-scription.

Content-based methods have some shortcomings. One is that the profiles of each user are based on objective information about the items [19]. This while selecting an item is based mostly on subjective attributes (e.g. a specific taste) of the item and these are not consid-ered. Another shortcoming is overspecialization which means that these systems recom-mend what the users already have seen and the users then get restricted to items similar to those already seen. This is often solved by injecting a note of randomness. A third shortcoming occurs when the users are inactive or reluctant to perform actions not directed towards their immediate goals. This leads to less usable data and hence to inaccurate rec-ommendations.

2.9.3 Collaborative filtering Recommendation system

(19)

Figure 5: A Figure of the overall principle of collaborative filtering. A very general de-scription.

Collaborative filtering requires three essentials to function well [20]. It needs many partic-ipating users to be able to find users similar to each other. There must be an easy way to represent the interests of each user. Last but not least, users with similar interests must also be able to be matched by the algorithms. These essentials lead to some shortcomings for the method [19]. The first one is the early-rater problem, which means that when a new item appears in the database, it cannot be recommended to a user until the system has ratings from other users or similarity to other items. The next shortcoming is about sparsity of data. If the number of users is small relative to the volume of items in the system, it will be harder to find neighbors and hence to recommend items.

(20)

There are two types of collaborative filtering systems which are Memory-based and Model-based [18]. Memory-Model-based are divided into user-Model-based and item-Model-based collaborative filter-ing. User-based collaborative filtering makes recommendations to a user based on items that have been liked by similar users [22]. Each user profile is sorted by its similarity towards the target user’s profile. Thus, the preferences of more similar users contribute more to the recommendations. The set of similar users can be extracted with a threshold or by selecting the top users. Item-based collaborative filtering uses similarity between items instead of users. The chance that a user likes a specific item can for example be predicted by aver-aging the ratings of similar items rated by this user. Or predicted to be interesting because other users have liked the target item as much as other items that the target user also seems to like [18]. Each item is sorted by its similarity towards the target item, and ratings from similar items are weighted stronger.

Model-based collaborative filtering uses training examples to generate a model that is able to predict the ratings for new unrated items for each user [18, 22]. Examples of such models in-clude decision trees, aspect models, rule-based models, Bayesian methods and latent factor models. The resulting models are better at dealing with sparsity of data than the memory-based methods. The model-memory-based collaborative filtering models are developed with data mining and machine learning algorithms. Techniques such as dimensionality reduction are often used to improve the accuracy of the predictions.

2.9.4 Hybrid Recommendation system

Most shortcomings of the content-based and collaborative filtering can be solved by com-bining these approaches as they often are complementary [19]. The early-rater problem for collaborative filtering can be solved with content-based methods, since new items then can be recommended on the basis of their content, not just on ratings from other users. Content-based systems can also eliminate the sparsity problem of collaborative filtering, by making recommendations based on the content of the items. A big number of participants is not important in content-based methods because all users get individual recommendations. The collaborative filtering systems can solve the shortcoming of lack of subjective data about the items, for content-based systems [19]. Subjective data can be a preference of one item liked by a similar user or friend, and this kind of data can be handled by col-laborative filtering. Colcol-laborative filtering can also solve the lack of novelty (new kinds of items) for content-based systems. Collaborative filtering methods can identify novelty using preferences from other users and hence recommend items dissimilar to those seen before. Furthermore, when the content-based system lacks user preferences to represent a user’s interests, a collaborative system can complete the user information with other users’ experiences as a basis.

(21)

2.10 Techniques for Recommendation systems

There are some different ways of implementing item-based and user-based collaborative filtering [18]. The idea for the methods is to first find similar users/items by comparing their profiles and then recommend items similar users may like, or items that seemed similar to an item the target user liked based on other preferences. The profiles to compare consist of different user’s preferences. The preferences often consist of scores or ratings from users or values for how much each user interacted with the items. To calculate similarity among users or items for collaborative filtering (item-based or user-based), different correlation methods are used. For example, the Cosine similarity method [24]. Cosine similarity returns a value between -1 to 1. This method calculates the angle between two vectors, an angle of zero gives a cosine value of 1. This means that the vectors are closely related to each other. If the two vectors are orthogonal, the cosine value will be 0 and hence, they are unrelated. See equation 2.1 for the Cosine formula.

Sim(ui, uk) = ∑mj=1ri jrk j q ∑mj=1ri j2∑ m j=1r2k j (2.1)

Pearson similarity is another way of calculating correlation between two vectors [25]. Pear-son calculates linear similarity between the vectors. The PearPear-son Score corrects for grade inflation, which means that items that consistently have higher scores than others still can be a perfect correlation with the others, if the difference is consistent. This Pearson formula returns a value between -1 and 1, where 1 means two items have the exact same ratings and thus, are closely related. Pearson correlation is cosine similarity over the mean centered vectors. See equation 2.2 for the Pearson formula. Euclidean distance is another measure for similarity between vectors. Euclidean distance is calculated between two points in space and corresponds to the length of a straight line between them [26]. The points correspond to items/users. The resulting correlation value will be somewhere between 0 - √2. The closer to 0, the smaller distance and the more similar. The maximum distance is √2 be-cause it is calculated in a coordinate system where the axes are 1 (p(1 − 0)2+ (0 − 1)2).

See equation 2.3 for how Euclidean distance is calculated.

Sim(ui, uk) = ∑j(ri j− ri)(rk j− rk) q ∑j(ri j− ri)2∑j(rk j− rk)2 (2.2) d(x, y) = s n

i=1 (xi− yi)2 (2.3)

(22)

distance or Cosine similarity. Text can be represented as a d-dimensional vector in many ways. For example, with the bag of words method, TF-IDF method or with word2Vec embedding.

The bag of words method represents the words within each document as a d-dimensional vector [27]. The d in the d-dimensional vector, represents the number of unique words for the document. When calculating the Euclidean distance for the vectors it is possible to find the nearest neighbors and recommend these. The primary disadvantage for this method is that it gives low importance to less frequently observed words in the documents. Bag of words does not preserve the order of words either. The TF-IDF method gives a vector with more weight on less frequent observed words. This method assigns a weight to each word in the document, based on Term frequency in the document (TF) and inverse document frequency (IDF), which is the occurrence for all documents. Thus, words that occur more often in a document but less often in all other documents will receive a higher TF-IDF value. A disadvantage with both bag of words and TF-IDF is that they do not capture semantic and syntactic similarity of a given word with other words [27]. Word embedding methods, such as Word2Vec, GloVe and fastText can find semantic similarity between words. Semantic similarity can be found as associations between words like ”trump” and ”white house”, that often occur together. Word2Vec observes patterns during training and represents each word with a d-dimensional vector. A large training set is needed for accurate results with this method. There are pre-trained models ready to be used. The models give vector representa-tions for each word but if vector representarepresenta-tions for whole sentences are needed, an average between all words can be calculated. This is called the average Word2Vec model.

Furthermore, CF-IDF (Concept Frequency-Inverse Document Frequency) is another method for vector representation that captures semantic similarity [28, 29]. It is based on weighting of the occurrences of references to concepts in a domain ontology. CF-IDF is similar to TF-IDF but it does not count term frequencies, it counts frequencies of specific concepts. Every concept contains different terms and different forms of each term. Each article in CF-IDF systems is represented as a set of concepts. CF-IDF user profiles consist of subsets of the concepts and relations stored within an ontology. Each user profile contains extractions of the most frequent concepts from read articles.

All these methods for vector representation can be used for different features for an article and multiple features might be needed to consider in order to make a robust recommen-dation system [27]. Weights can also be put on these different features to order them by importance. Different weights and features can be used to make appropriate recommenda-tion systems for different purposes.

2.11 Previously created Recommendation systems

(23)

was used to predict the interests of each user, with topic categories from a text classifier. They used a Bayesian model to predict the news interests for each individual user, based on activities of that user, and the news trends, based on activities of a group of users.

One example of a collaborative filtering recommendation system is one for movies and their ratings [30]. The principle for the system is that users rate movies and will then receive a list of recommended movies based on earlier ratings. This system is item-based. Thus, it is trying to find similar items to the current one, based on earlier movie ratings. The system calculates correlations to find similar items. The correlations are calculated by first creating a matrix where each column contains a movie name and each row contains a rating from a user to a movie. Then the correlations are calculated between the target movie and the rest of the movies to get the most similar movies and correlation values for how similar they are. They used Panda’s method .corrwith() to calculate the correlations [31]. This function calculates pairwise correlations between the movies and gives correlation values somewhere between -1 to 1. The closer to 0, the less similar. They only considered movies with 50 or more ratings to get valid and meaningful recommendations [30]. Finally, they recommend the items that are most similar to those already seen for each user.

Another example of collaborative filtering is implemented in the IMDB website [25]. In this example, users can also rate movies on the site and then receive a list of recommended movies based on earlier ratings. But this system is user-based. Thus, it is trying to find similar users to the target user, based on earlier movie ratings from all the users. To calculate the similarity between users in this system, they used Pearson correlation, see section 2.10 for a description of the method. Pearson is the most common measure for similarity in memory-based collaborative filtering systems.

There are some studies comparing the CF-IDF recommendation method and its performance with the traditional TF-IDF method, for news recommendation systems [28, 29]. The de-velopers of the CF-IDF method aimed to reduce noise caused by non-meaningful terms for TF-IDF by using semantics. They therefore created CF-IDF user profiles consisting of subsets of different concepts and relations stored within an ontology. See a description of CF-IDF in section 2.10. CF-IDF outperforms TF-IDF in terms of accuracy, recall and F1.

2.12 Evaluations of Recommendation systems

To evaluate the performance of a recommendation system, a Validation Framework needs to be built to measure the quality of the recommendations generated [32]. The evaluation can be conducted online and offline. The online method requires the algorithm to be deployed in production, to be able to validate the recommendations with user interactions. This method may represent and measure the system’s real performance, but it is time consuming. The metrics that can be used for online evaluations are Customer Lifetime Value (CLTV), Click-Through Rate (CTR), Return On Investment (ROI) and Purchases. The online evaluation method needs to be applied with A/B testing principles.

(24)

Precision, Recall and Accuracy. The authors for the paper about evaluations [32] build a recommendation system and to train the engine they used 3 months’ interaction data; this is called the training set. Then they selected a recommendation day and gave the users 29 more days to interact with the items recommended that day; this is the validation set. They divided the users into three groups, based on how many items they have interacted with. The groups were called “New”, “Regular” and “VIP”. New users interacted with 1–4 items, Regular users with 5–10 items and VIP users with more than 10 items. This is called the segmentation trick and it compensates for the differences in the number of interactions among users.

(25)

Method

This chapter explains the methodology of the project. The method for this project is divided into three parts. The first part consists of determining which methods are best suited to implement personalization in the described context of VK’s online newspaper. The second part consists of first extracting a usable dataset from VK’s data, and then implementing and training a system with the data. The final part consists of testing and evaluating the output recommendations.

The data extraction part of the project consisted of first studying VK’s data to find suitable data to use as a basis for the recommendations. When useful data was found, a dataset was extracted into a file for use in the implementation of the system. See the data extraction in more detail in section 3.1. The implementation part of the project consisted of investigat-ing different recommendation methods and then buildinvestigat-ing a content-based recommendation system and a complementary user-based collaborative filtering part to add to the system. The collaborative filtering part is not a recommendation system on its own but can be used together with the content-based system to potentially improve its performance.

The content-based recommendation system makes feature extractions from each read article for each user and compares these to features for a new article. If some of the already read articles are similar enough to the new one, the new article gets recommended. The features consist of title, subject tags and location tags. See section 3.2.1 for more details about the content-based part. The collaborative filtering part compares read articles and tags for each user with all other users’ to find the most similar ones. Then it adds a proportionate amount of articles from the set of read articles for the similar users to the target user. This in order to be able to recommend articles based on what similar users may like. See section 3.2.2 for a more detailed explanation of the user-based implementation.

(26)

3.1 Data extraction

The data extraction part of the project consisted of first studying VK’s data and then ex-tracting a dataset containing features for articles and a dataset containing user preferences. The data studied included data about push notifications pressed, read articles, user activity and other events. There are for example data about what articles each user has read, how much of the article body and header the user interacted with and if the user pressed on a notification to get to the article. There is also data available about each article, such as title, tags and a subjectively estimated news value set by the article author. Jetbrain’s Datagrip was used to make SQL queries and hence to get the data. The data warehouse service Red-shift was used to get data from the data warehouse of VK. After discussing and studying the data a dataset was extracted to a file in CSV format. See Table 1 and 2 for the format of the dataset extracted. All this data was later used for the recommendation engine.

The dataset extracted consists of two tables. One table contains article features, thus in-formation about each article, and the other contains user preferences which is inin-formation about read articles for each user. Table 1 contains article features and has one row for each article and tag, thus if one article has three tags it gets three rows. The first column con-tains article id, the second concon-tains title for the article, next column concon-tains the news value for the article, the fourth contains a tag and the last column contains the type for that tag. This data was chosen because all articles have this and because this data is informative, see further motives for the selected data in chapter 5. The news value is an integer between 1 - 6, deciding how valuable and interesting an article is when it comes to news. The value is set by the author. The tags are strings, specifying the content of the article, for example “Ume˚a”, “Stefan L¨ofven” and “Bj¨orkl¨oven”. There are more than 11 000 tags and most tags are generated by a machine learning algorithm. Each tag has one type and there are ten different tag types to extract from VK’s database. The tag types that were extracted to this dataset are “topic”, “place”, “category”, “person”, “organisation”, “story” and “section”. Some types are more general than others. These tag types were extracted to the dataset to be able to separate different types of tags from each other.

Table 2 contains user preferences and has one row for each user and article, hence if one user has read five articles it gets five rows. The first column contains user id for the user, the next column contains article id and the last contains a reading value for the article. The reading value is a float value representing how much the user interacted with the article. VK saves values for how much each user has interacted with different article bodies and headers. They also save a notification id if a user presses on a push notification. The reading values were calculated with these values as shown in equation 3.1, to represent a value of how much each user likes different articles. Thus, this data was extracted to represent what each user likes to read.

readingvalue= body + 0.1 ∗ header + 0.4 ∗ push (3.1) where: body = value between 0 – 1 representing the read parts of the article-body

(27)

ArticleId Title NewsValue TagName TagType ArticleId 1 Title for article News value Tag 1 Type for tag 1 ArticleId 1 Title for article News value Tag 2 Type for tag 2 ArticleId 2 Title for article News value Tag 1 Type for tag 1 ArticleId 2 Title for article News value Tag 2 Type for tag 2 ArticleId 2 Title for article News value Tag 3 Type for tag 3 ArticleId 3 Title for article News value Tag 1 Type for tag 1

Table 1: A Table explaining the input data for the program. This Table contains article features, which are titles, news values, tags and tag types.

UserId ArticleId ReadingValue UserId 1 ArticleId 1 ReadingValue for article UserId 1 ArticleId 2 ReadingValue for article UserId 1 ArticleId 3 ReadingValue for article UserId 2 ArticleId 1 ReadingValue for article UserId 2 ArticleId 2 ReadingValue for article UserId 3 ArticleId 1 ReadingValue for article

Table 2: A Table explaining the input data for the program. This Table contains user pref-erences, thus data about which articles each user has read and how much the user has interacted with the articles.

3.2 Implementation

The implementation of the recommendation system was divided into building a content-based recommendation system and building a user-content-based collaborative filtering algorithm to complement and potentially improve the content-based system. The implemented recom-mendation system is a hybrid as it uses both user-based collaborative filtering and content-based filtering. The content-content-based recommendation system makes TF-IDF vectors of fea-tures from each read article for each user and calculates the Euclidean distance between these vectors to the vector of a new article. If some of the old articles are similar enough to the new one, the new article gets recommended. The collaborative filtering part calculates Pearson correlations for read articles and tags for each user with all other users’ articles and tags to find the most similar users. Then it adds a proportionate amount of articles from the set of read articles for the similar users to the target user. The program was implemented with the language Python. Jetbrain’s PyCharm was used as the implementation environ-ment. The Python libraries “Pandas” and “Numpy” were used during the implementation. The output for the program was tested with JupyterLab during the implementation.

3.2.1 Content-based Recommendation system

(28)

are calculated with Euclidean distance1for article features in TF-IDF vectors2. The system should work as a filter for all notifications and users.

The comparisons between the new article and each read article is conducted with TF-IDF vectorization for title, subject tags and location tags. To use the title of each article the title needs to be preprocessed first. This by first filtering out the most common words, called “stopwords”3. For example, some English stopwords are “it”, “he”, “they” and “today”, which are filtered out because they are too common and are therefore not representing key-words for a document [34]. Then the remaining key-words for each title get transformed with a stemming tool4. Stemming cuts off the end or the beginning of each word, taking into account a list of prefixes and suffixes for the selected language [34]. The vectorizer is then used to transform the filtered titles into TF-IDF vectors and finally pairwise Euclidean dis-tance between different vectors are calculated. The next step is to remove location tags and to put all other tags for each article on one row. These are then also vectorized and pairwise distances get calculated for the subject tags as well. The program finally does the same for the remaining location tags.

The resulting value for pairwise Euclidean distance is between 0 - √2 or NaN if there is no similarity. The maximum distance is√2 because it is calculated in a two-dimensional coordinate system where the axes are 1 (p(1 − 0)2+ (0 − 1)2). See equation 2.3 for the

formula of the Euclidean distance. The output for each set of comparisons are sorted lists where the most similar article is at the top, thus the already read article with the lowest Euclidean distance to the new article. All comparisons are later put together to find the most similar articles considering title, tags and location. Then the reading value is added to the calculation, to compensate for how much each user interacted with each read article. Finally, news value for the new article is added to the calculation, to compensate for the importance of each new article and the trend aspect. See the final formula for how these variables are put together in the program in equation 3.2.

The sorted tables with the most similar read articles at the top, for each new article and user are then used to decide whether to push the new article or not. See an example of such a table in Table 3. The result values in the Table are values calculated with equation 3.2. If the new article is similar enough to some already read ones it gets pushed. See algorithm 1 for how the program decides whether to push the new article or not with a table formatted as Table 3. Here the program creates a threshold for how low the result values must be. The threshold is in this case based on how many articles the user has read the week before. This in order to give users who have read very few articles a greater chance of getting an article recommended based on their earlier readings, and vice versa. Users with more than 100 read articles (a few users) get their “trainlen”- value set to 100 in the algorithm to avoid getting unreasonably few recommendations. But the threshold can also be a fixed value, resulting in more recommendations for more read articles. The algorithm now recommends pushing articles with a news value greater than 5, with at least one result value greater than the threshold and articles with three result values just above the threshold.

1Skikit-learn. 2019. sklearn.metrics.pairwise distances. [Online] Available: https://scikit-learn.

org/stable/modules/generated/sklearn.metrics.pairwise_distances.html

2Skikit-learn. 2019. sklearn. f eature

extraction.text.T f id f Vectorizer. [Online] Available:

https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text. TfidfVectorizer.html

3HTM System Solutions. Introduktion till NLP. [Online] Available: http://www.htms.se/blog/2017/

11/27/introduktion-till-nlp/

(29)

similarity=titlesimilarity∗ wtitle+ tagsimilarity∗ wtag+ locsimilarity∗ wloc wtitle+ wtag+ wloc

(3.2) +(0.9 − readingvalue) ∗ wreadingvalue+ (2.5 − newsvalue) ∗ wnewsvalue

where: titlesimilarity = similarly of titles for the new and the old article (between 0

-√ 2) tagsimilarity = similarly of tags for the new and the old article (between 0

-√ 2) locsimilarity = similarly of location for the new and the old article (between 0

-√ 2) readingvalue= how much the user interacted with the article (between 0 – 1,5) newsvalue = the news value for the new article (0, 1, 2, 3, 4, 5 or 6)

wtitle = weight for title similarity (importance)

wtag = weight for tags similarity (importance)

wloc = weight for location similarity (importance)

wreadingvalue = weight for reading value (importance)

wnewsvalue = weight for news value (importance)

ArticleId EuclideanTitle EuclideanTags EuclideanLoc ReadingValue Result ArticleId-1 1.3275 1.1632 0.7923 1.1 1.4315 ArticleId-2 1.4142 1.3843 1.4142 1.5 1.4633 ArticleId-3 1.4142 1.3925 1.3431 1.1 1.5842 ArtilceId-4 1.4142 1.4142 1.3807 4 1.1 1.5932

Table 3: A resulting table after all Euclidean distances have been calculated, and added together, with reading values and news value. It displays article id, Euclidean distance for title, Euclidean distance for tags, Euclidean distance for location tags, reading value and a resulting value.

Algorithm 1 The algorithm for deciding whether to recommend (push) an article or not. The numbers (0.2, 1, 100 and 600) in the algorithm are arbitrary and varying these give different numbers of recommendations to the users.

if trainlen > 100 then trainlen← 100 end if threshold← 1 − trainlen/600 if newsValue > 4 then Recommend else

if resVal[0] < threshold then Recommend

else if (resVal[0] + resVal[1] + resVal[2]) < (threshold ∗ 3 + 0.2) then Recommend

(30)

3.2.2 User-based Recommendation system

The principle for the user-based part is that if user A and B like to read the same articles and about the same tags, and user B likes one more article, then user A may also like articles similar to that one. Thus, user A would then also get recommendations to read articles similar to that article. The user-based part of the system first compares each user’s interests with other users’ interests to find similar users. When similar users are found, the system recommends articles that similar users may like. The comparisons between users are made with article id:s and article tags. The articles representing the preferences of each user are taken from the past week. The format for the article features and the user preferences can be found in section 3.1. The similarities are calculated with Pearson Similarity5. This user-based part is a complementary program to run before the content-user-based system in order to make it recommend new kinds of articles, that similar users may like.

UserId UserId 1 UserId 2 UserId 3 UserId 4 UserId 5 ArticleId

ArticleId 1 rv11 rv12 rv13 rv14 rv15 ArticleId 2 rv21 rv22 rv23 rv24 rv25 ArticleId 3 rv31 rv32 rv33 rv34 rv35

Table 4: The table format used to calculate correlations for articles between different users with the Pearson algorithm. Rv stands for reading value.

UserId UserId 1 UserId 2 UserId 3 UserId 4 UserId 5 TagName

Tag 1 rv11 rv12 rv13 rv14 rv15 Tag 2 rv21 rv22 rv23 rv24 rv25 Tag 3 rv31 rv32 rv33 rv34 rv35

Table 5: The table format used to calculate correlations for tags between different users with the Pearson algorithm. Rv stands for reading value.

To find similar users to one user, the program compares the reading behavior for one user with each other users’ reading behavior. The reading behaviors consist of reading values for both articles and tags. The program uses the Pearson algorithm to calculate linear cor-relation for a set of articles and then another set of tags. This Pearson algorithm returns a value between -1 to 1, where 1 means two users have read the exact same articles/tags and have the same reading values. The Tables 4 and 5 show the table formats used to calculate correlations for tags and articles between different users with the Pearson algorithm. The results from the two sets of comparisons are two tables with the highest correlations at the top. The two correlation tables and the correlations for articles and tags are then added together with weights as shown in equation 3.3. Then the tables are sorted to get the most similar users, when considering both similarities, at the top. See Table 6 for an example of a resulting correlation table for a user, after all calculations are conducted.

5SciPy. 2019. scipy.stats.pearsonr. [Online] Available: https://docs.scipy.org/doc/scipy/

(31)

totalcorr= corrarticle∗ warticle+ corrtags∗ wtags (3.3)

where: corrarticle= calculated correlation between articles, value between 0 – 1

corrtags = calculated correlation between tags, value between 0 – 1

warticle = weight for article correlation (importance)

wtags = weight for tags correlation (importance)

When the most similar users are found, a proportionate amount of articles are added from the three most similar users to the target user. Thus, new rows in Table 2 are added, with the target user’s id and the new article id with a neutral reading value (1). The articles with the highest reading values are added. The articles are added to Table 2 because they will then be used in the content-based part of the system. Thus, these newly added articles may give the target user some new and maybe completely different recommendations based on similar users’ preferences. The amount of articles added depends on the amount of articles that the target user already has read. See equation 3.4 for how many articles to add. Thus, these newly added articles will complement and make the content-based part function in another way and hence recommend new kinds of articles.

amount= articles 10 ∗ i



(3.4) where: articles = the amount read articles for the target user

i = the index for the similar user (1 is the most similar), from 1-3

UserId CorrArticle CorrTag CorrResult UserId-1 0.5119 0.8733 0.6203 UserId-2 0.3811 0.8821 0.5314 UserId-3 0.4436 0.7303 0.5296 UserId-4 0.3656 0.9035 0.5269

Table 6: A resulting table after all Pearson similarities have been calculated and added together. It displays user id, Pearson similarity for article id:s, Pearson similarity for tags and a resulting value, calculated as shown in equation 3.3.

3.3 Evaluation methods

(32)

3.3.1 Evaluation of the Content-based System

The offline testing method was used to validate the resulting recommendations from the content-based recommendation system. Thus, the system did not provide recommendations to users in reality. The system used already published articles and made recommendations from them, hence pretending that they were new. It “recommended” old articles and knew that a recommendation was relevant if the target user had read the article in reality. The tests for the system were conducted with read articles taken from a week, representing the preferences of each user, and a test-set with articles published during two days after that week to represent new articles. The users had two more weeks to read the articles published in the test-set. For each user, the program made recommendations for each new article in the test-set, to push the article or not. Then the recommended articles for each user were compared to the articles that the user actually read in reality, to find the recommendations that were most relevant for the user. The number of tests conducted, recommendations, ensured relevant recommendations, earlier read articles and read articles in the test-set were then counted.

The program also generated two other test sets for comparisons. These recommendation sets consisted of a set with the same number of articles with the highest news values found, and three sets of the same number of randomized articles. The program counted the number of ensured relevant recommendations from these sets. A mean value for how many ensured relevant recommendations the three random sets generated was counted. The users were also divided into three groups as different users have read different amounts of articles. The users in the first group have read less than 20 articles the past week, the users in the second group have read more than 20 articles and less than 60 articles and the last group have read more than 60 articles during the week. The total amount of users considered in the tests were 3 000, 633 from the group with less than 20 read articles, 1 585 from the group with less than 60 read articles and 782 from the group with more read articles.

Finally, some metrics were calculated with the data generated by the tests. The metrics were calculated for all users tested in each group and for all groups as a whole. These metrics were first of all Precision and Accuracy, see how they are calculated in section 2.12. The Accuracy was calculated by dividing the number of ensured relevant recommendations with the number of recommendations instead. Then the ensured relevant recommendations from the recommendation engine were compared to the other recommendation sets by dividing the number of correct recommendations with the number of correct recommendations from the other sets. This in order to see how much better or worse than these recommendation methods the system performs. Finally, the mean numbers recommendations per user were calculated.

(33)

News value Ad. threshold Title weight Tag weight Location weight

Test 1 Yes No 1 0.6 0.05

Test 2 Yes No 1 0 0

Test 3 Yes No 0 1 0.1

Test 4 Yes Yes 1 0.6 0.05

Test 5 No No 1 0.6 0.05

Table 7: A Table displaying the different weights and parameters that were used when testing the recommendation system.

3.3.2 Evaluation of the User-based part of the system

The user-based part of the system is more complicated to test offline as its primary purpose is to complement the content-based part with novelty and to represent users’ interests for users without enough data. Its purpose is also to add parameters of subjectivity to the recommended content, as users’ interests may not be visible in the features extracted from the articles. Thus, this part of the recommendation system recommends content that users in many cases have not been engaged with, partly in order to evoke new interests. As the users do not actually receive these recommendations in offline tests, this part of the system should therefore be tested online in order to get valid and fair results.

(34)

Results

This chapter presents the results from the evaluation described in section 3.3. The results and the chapter are divided into results from the content-based part and results from the user-based part of the recommendation system. The results from the content-based part and some of the user-based results are presented with test values generated from the tests conducted, and with some metrics calculated with the test values. The rest of the user-based test results are presented with example evaluations. The results are discussed in chapter 5.

4.1 Results from the Content-based Evaluation

(35)

Past read Read Pushed Correct Correct (nv) Correct (r) Users Total 138 856 54 504 33 242 8 395 5 666 1 706 3 000

Short 7 969 5 733 3 617 627 433 77 633

Middle 60 824 24 848 15 377 3 564 2 117 579 1 585 Long 70 063 23 923 14 248 4 204 3 116 1 049 782

Table 8: A Table displaying the resulting data from an evaluation with news value consid-ered and without an adapted threshold. The title weight was set to 1, the tag weight was set to 0.6 and the location weight to 0.05, see the weights in equation 3.2.

Past read Read Pushed Correct Correct (nv) Correct (r) Users Total 138 856 54 504 31 998 7 545 5 162 1 562 3 000

Short 7 969 5 733 3 608 618 437 72 633

Middle 60 824 24 848 15 485 3 302 2 061 559 1 585 Long 70 063 23 923 12 905 3 625 2 664 931 782

Table 9: A Table displaying the resulting data from an evaluation with news value con-sidered and without an adapted threshold. The title weight was set to 1, the tag weight was set to 0 and the location weight to 0, see the weights in equation 3.2.

Past read Read Pushed Correct Correct (nv) Correct (r) Users Total 138 856 54 504 34 256 5 020 5 770 1 762 3 000

Short 7 969 5 733 3 427 422 410 73 633

Middle 60 824 24 848 15 347 2 074 2 081 564 1 585 Long 70 063 23 923 15 482 2 524 3 279 1 126 782

Table 10: A Table displaying the resulting data from an evaluation with news value con-sidered and without an adapted threshold. The title weight was set to 0, the tag weight was set to 1 and the location weight to 0.1, see the weights in equation 3.2.

Past read Read Pushed Correct Correct (nv) Correct (r) Users Total 138 856 54 504 18 554 4 844 3 521 770 3 000

Short 7 969 5 733 3 284 573 426 65 633

Middle 60 824 24 848 10 066 2 434 1 673 340 1 585 Long 70 063 23 923 5 204 1 837 1 422 365 782

(36)

Past read Read Pushed Correct Correct (nv) Correct (r) Users Total 138 856 54 504 33 579 4 172 5 951 1 740 3 000

Short 7 969 5 733 3 566 392 424 64 633

Middle 60 824 24 848 14 678 1 748 2 108 533 1 585 Long 70 063 23 923 15 335 2 032 3 419 1 143 782

Table 12: A Table displaying the resulting data from an evaluation without news value considered and without an adapted threshold. The title weight was set to 1, the tag weight was set to 0.6 and the location weight to 0.05, see the weights in equation 3.2.

These test results generated from the recommendation system were used to calculate some metrics with. Each metric was calculated for each group and for all groups as a whole. The first metrics calculated were comparisons between the correct recommendations from the system and the other recommendation sets. This is how these metrics were calculated:

News value comparison (NV)=Number of recommended articles the users have interacted with Number of news value articles the users have interacted with

Randomized comparison (R)=Number of recommended articles the users have interacted with Number of randomized articles the users have interacted with Then, the metrics precision and accuracy were calculated like this:

Precision for all users (P)=Number of recommended articles the users have interacted with (Number of articles to recommend) ∗ (Number of users)

Accuracy (A)=Number of recommended articles the users have interacted with Number of recommended articles

(37)

NV Comp. R Comp. Precision Accuracy Rec / user Total 1.482 4.921 0.0061 0.253 11.1 Short 1.448 8.143 0.0022 0.173 5.7 Middle 1.684 6.155 0.0049 0.232 9.7 Long 1.349 4.008 0.0117 0.295 18.2

Table 13: The Table shows the calculated comparisons with the news value and the ran-domized recommendation sets, the calculated precision, accuracy and recom-mendations per user, for the test results in Table 8.

NV Comp. R Comp. Precision Accuracy Rec / user Total 1.462 4.830 0.0055 0.236 10.7 Short 1.414 8.583 0.0021 0.171 5.7 Middle 1.602 5.907 0.0045 0.213 9.8 Long 1.361 3.894 0.0100 0.281 16.5

Table 14: The Table shows the calculated comparisons with the news value and the ran-domized recommendation sets, the calculated precision, accuracy and recom-mendations per user, for the test results in Table 9 (with no tags considered).

NV Comp. R Comp. Precision Accuracy Rec / user Total 0.870 2.849 0.0036 0.147 11.4 Short 1.029 5.781 0.0015 0.123 5.4 Middle 0.997 3.677 0.0029 0.135 9.7 Long 0.770 2.242 0.0070 0.163 19.8

Table 15: The Table shows the calculated comparisons with the news value and the ran-domized recommendation sets, the calculated precision, accuracy and recom-mendations per user, for the test results in Table 10 (with no titles considered).

NV Comp. R Comp. Precision Accuracy Rec / user Total 1.376 6.291 0.0035 0.261 6.2 Short 1.345 8.815 0.0020 0.174 5.2 Middle 1.455 7.159 0.0033 0.242 6.4 Long 1.292 5.033 0.0051 0.353 6.7

Table 16: The Table shows the calculated comparisons with the news value and the ran-domized recommendation sets, the calculated precision, accuracy and recom-mendations per user, for the test results in Table 11 (with an adapted threshold).

NV Comp. R Comp. Precision Accuracy Rec / user Total 0.701 2.398 0.0030 0.124 10.8 Short 0.925 6.125 0.0013 0.110 5.6 Middle 0.829 3.280 0.0024 0.119 9.3 Long 0.594 1.778 0.0057 0.133 19.6

References

Related documents

Attention** Close relationship Close and affable dialogue between mental health workers and patients Insightful listening To attend to and to respond with deep understanding to

System-initiated approach is the opposite of the first method – user is not actively involved in the content discovery – the system does everything for him using

When a user is not present within the tolerated area of other users, then that user is left out of the group, which means that he/she either reveals his/her location-time information

The number of features is carefully chosen to maximise accuracy in order to best predict genres and at the same time limit the variance of the results.. In order to keep only the

Users can mark objects using barcode and NFC stickers and then add their own message to identify them.. The system has been designed using an iterative process taking the feedback

As the curve’s spikes and valleys for item based and user based collaborative filtering seem to be nearly identical for each individual test, we can assume that the algorithms work

The purpose of the recommender system is to gather articles from a website, rank the content concerning its topics and suggest the highest scoring article based on those rankings

1. Classic IR Systems- Used in closed information domains. The hyperspace is tuned to give for providing a quick browsing. Searching filters – Used in open information domains.