Contextualizing music recommendations : A collaborative filtering approach using matrix factorization and implicit ratings

(1)

Linköpings universitet

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Datateknik

2020 | LIU-IDA/LITH-EX-A--20/043--SE

Contextualizing

music recommendations

–

A collaborative ﬁltering approach using matrix factorization

and implicit ratings

Kontextualisering av musikrekommendationer

Alexander Häger

Supervisor: Arne Jönsson Examiner: Marco Kuhlmann

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Recommender systems are helpful tools employed abundantly in online applications to help users find what they want. This thesis re-purposes a collaborative filtering rec-ommender built for incorporating social media (hash)tags to be used as a context-aware recommender, using time of day and activity as contextual factors. The recommender uses a matrix factorization approach for implicit feedback, in a music streaming setting. Con-textual data is collected from users’ mobile phones while they are listening to music. It is shown in an offline test that this approach improves recall when compared to a recom-mender that does not account for the context the user was in. Future work should explore the qualities of this model further, as well as investigate how this model’s recommenda-tions can be surfaced in an application.

(4)

List of Figures

3.1 Data processing overview. . . 10

3.2 Sessionization example. . . 13

4.1 Context attribution results. . . 16

(6)

List of Tables

3.1 An overview of the logs used. . . 11

3.2 Schema of the activity log. . . 11

3.3 Schema of the music log. . . 12

3.4 Time of day definition. . . 12

(7)

1 Introduction

Asking for a recommendation can often be a shortcut through a lot of tedious work of search-ing, browsing and decision making. Ideally, a person recommending goods or services knows your preferences well-enough to suggest something suitable to you. It also seems beneficial to mention in what setting you are, when e.g. having dinner. Is this a dinner with family or will it occur during a business trip? This contextual knowledge might influence the recom-mendation.

Within the recommender systems research area work is being conducted to develop such algorithms and systems that help end-users by providing recommendations based upon some kind of profile that the system has created of the end-user. The applications of rec-ommender systems seem abundant in today’s online world. You may find such systems in e.g. e-commerce, entertainment, travel, social and news services. The reasons for a service provider to deploy a recommender system is usually mainly around increasing profits, but this goal does not necessarily conflict with the end-user’s interests. A business might like to retain end-users by showing that they understand returning customers and provide interest-ing and relevant recommendations, hopinterest-ing for an overall increase in user satisfaction [1].

1.1 Aim

This thesis will explore the field of context-aware recommender systems in a music stream-ing service scenario. The aim is to improve recommendation quality while usstream-ing implicit feedback and additional contextual information that has been gathered from user’s mobile phones while listening to music. The raw signals collected from the phones include local time and current activity (whether the user is stationary, walking, in a vehicle, etc.).

1.2 Research question

This thesis tries to answer the following research question: How is recommendation quality affected by incorporating contextual factors, based on time and activity, in track recommendation using collaborative filtering with implicit feedback?

(8)

1.3. Collaboration with Spotify

1.3 Collaboration with Spotify

This thesis was written in collaboration with Spotify. Being one of the prominent music streaming services around, Spotify has been able to provide domain expertise and data for this research.

1.4 Disposition

Chapter 2 goes on to explain recommender systems and the underlying theory of matrix fac-torization in collaborative filtering. Related work to music and context-aware recommenders is also mentioned. The work of this thesis is described in Chapter 3, with corresponding re-sults shown in Chapter 4. The rere-sults are then discussed in Chapter 5 and the conclusions summarized in Chapter 6.

(9)

2 Theory

This theory section will explain the concepts, definitions and related work of context-aware music recommender systems in a top-down manner. Section 2.1 and Section 2.2 gives an introduction to the research area and underlying concepts for collaborative filtering-based recommender systems. This is extended upon in Section 2.3, where context gets incorpo-rated into the algorithms. Recommender system evaluation is handled in Section 2.4. Finally, related work is mentioned in Section 2.5.

2.1 Recommender systems

Recommender systems is a collective term for software and the research area that deals with automatic, machine-generated recommendations that, in the majority of research, are per-sonalized to each individual user. This is done by looking at historical interactions between the user and the items the system serves and then trying to predict what the most suitable items to present are. Item is the general term for what a recommender system recommends to a user, often in the form of lists ordered by the item’s relevance. What the item actually is varies very much by domain. For instance, the streaming service Netflix1 recommends movies to their users based on their previous viewing history. Amazon2 is another exam-ple within the e-commerce sector that uses recommender systems to suggest e.g. books and many other products from their catalog.

Common techniques for music recommendation include collaborative filtering and content-based techniques. Content-content-based recommenders utilize the actual content (e.g. a track’s audio data) or metadata associated with an item to generate recommendations [2]. Collaborative filtering relies only on the interactions between users and items [3]. In practice there does not seem to exist a one-size-fits-all solution. One can therefore see research in hybrid techniques that tries to complement the strengths and weaknesses in each individual technique [4]. How-ever, collaborative filtering seem to be the most widely adopted recommender technique in general [1], possibly due to its domain-agnostic approach and generally satisfactory results. Collaborative filtering is also the technique that will be explored in this thesis and will be discussed further in Section 2.2.

1_{https://www.netflix.com} 2_{https://www.amazon.com}

(10)

2.2. Collaborative filtering

Central to most recommender systems is also the feedback mechanism used to learn a user’s preference for the items in the system. To return to the Netflix example: when a user has watched a movie on Netflix they can rate a movie on a scale of one to five stars, to show their preference for the movie. This is known as explicit feedback in the recommender system literature. Binary rating scales, also known as one-class feedback, can also be used to show whether a user likes or dislikes an item, or to incorporate e.g. social media (hash)tags [5]. While explicit feedback is a straightforward way of soliciting preference from users, one must also take care of how one interprets the feedback given. Some users are conservative with giving out one-star or five-star ratings, others might be more generous and perhaps some might only give feedback at the extreme ends of the rating scale, essentially giving binary feedback even though there is a range of values to choose from. Consequently, a five-star rating from a conservative user should been seen as higher praise of a movie than from a user that more commonly hand out five-star ratings. It is shown that accounting for these biases in user’s feedback leads to better recommendations [3, 6].

A recommender system can also utilize implicit feedback, alongside or instead of explicit feedback [6, 7]. Instead of having the user give you a rating, the system implicitly infers pref-erence from usage data. This can be how long a user spends on a web page, or how many times the user has streamed a track. Implicit feedback however only shows positive exam-ples. The absence of interactions does not necessarily mean that the user does not like an item — the user might simply not have discovered the item yet. The (relatively) short con-sumption time of a track lends itself well to implicit feedback. A track may be re-consumed several times, even in the same session (in contrast to e.g. books or movies). It is also un-derstandable that it can be a nuisance for users to explicitly rate every track they listen to, in everyday use [4].

2.2 Collaborative filtering

Collaborative filtering relies on the assumption that a user that shares similar taste with an-other group of users, will likely appreciate items which the user has not yet seen, but an-others in the group have. Traditional neighborhood-based collaborative filtering approaches mimic the above stated assumption very closely. By looking at historical feedback data and defining a similarity measure, as a function of two users’ item feedback vectors, one can compute a “neighborhood” of users for a particular user. Subsequently, to generate recommendations one predicts item ratings by combining the neighborhood’s feedback of items that the user has not seen, and serve the items with the highest predicted rating. The choice of similarity measure and prediction function is paramount to this approach of generating recommenda-tions, along with deciding how many users to include in a neighborhood [3].

However, in more recent research another class of recommenders, sometimes denoted as latent factor models [8], have gained more traction since they performed well in the dataset provided through the Netflix Prize3. These recommenders show favorable results in terms of accuracy and scalability [8–10]. Latent factor models instead try to express users and items in a dense latent space, compared to the feedback data it is trained on. Finding items to recommend is then a matter of finding the items closest to the user, in the latent space. This can be done through matrix factorization techniques such as singular value decomposition, popularized during the Netflix Prize competition [11]. The intuition for these models is that they can capture high-level concepts in the latent space (e.g. scary, action or comedy movies), that are woven into the feedback data from the users, and in turn are able to provide more interesting recommendations [10].

The recommendation process can be formalized by introducing some notation for the concepts that have already been discussed: users, items and feedback. Equation (2.1) and Equation (2.2) define the set of all users and items of a system, respectively. Also, let su,i

(11)

2.2. Collaborative filtering

denote the feedback observed from user u for item i. Together with these definitions one can construct the user-item matrix S, shown in Equation (2.3). Each row in the user-item matrix corresponds to all feedback observed from a user, while the columns represent all feedback observed for an item. These matrices tend to be sparse, as in most cases, a user has interacted only with a comparatively small subset of items that exist in the system.

U=tu : 0, 1, . . . , mu (2.1) I=ti : 0, 1, . . . , nu (2.2) S=    s0,0 . . . s0,n .. . . .. ... sm,0 . . . sm,n    , su,iPN (2.3)

The core task of a recommender system can be defined as Equation (2.4) [12]. That is, find the item i1_{that maximizes the utility given by the recommendation function r, where r}

conforms to Equation (2.5). Therefore, r needs to model each user’s individual preferences in order to find recommendations, and this is accomplished by using the data available in the user-item matrix S.

@u P U, i1₌_{arg max} iPI

r(u, i) (2.4)

r : U ˆ I ÑR (2.5)

The utility given by r is often a prediction of the rating a user would have given the item, when using an explicit feedback scale. The feedback stored in S would also be in the explicit rating’s scale. However, with implicit feedback this value can become a more general measure of utility or usefulness for the given user, depending on how r is defined and what kind of implicit feedback that is gathered. As this thesis deals with the music streaming use-case the following sections will assume the use of implicit feedback, as motivated in Section 2.1. In this work, we let su,i be the number of times the user u has listened to (or,

streamed) track i [5, 7].

Finding an appropriate function for r is the matter of countless papers within recom-mender systems research [3, 5, 6, 11–15]. A common implicit feedback matrix factorization model in literature, denoted here as the MF-model (Matrix Factorization), tries to learn the factorization of S by optimizing for the error function shown in Equation (2.6) using the alter-nating least-squares (ALS) method [7]. The error function consists of a squared sum-of-errors term and a regularizing term. The sum-of-errors term has modeled the feedback in a way that separates the raw feedback su,iinto a confidence function c, defined in Equation (2.7),

and a preference function p, defined in Equation (2.8). Modeling su,i this way yields a base

preference for tracks that have been interacted with, but also a growing confidence that the user likes a track as the user listens more and more. This modeling is shown to increase the accuracy of the model, compared to just using raw observations in the error function.

min @xu,yi ÿ u,i c(α, su,i) p(su,i)´xTuyi 2 +λ ÿ u }xu}2+ ÿ i }y_i}2 ! (2.6) c(η, x) =1+ηlog(1+x) (2.7) p(x) = # 1, x ą 0 0, x=0 (2.8)

(12)

2.3. Incorporating context

By training the MF-model one achieves the factorization of S through the xuand yilatent

factor vectors, which represent user u and item i in latent space. To generate recommenda-tions is then simply the matter of taking the dot product of xuand yito generate utility scores

for previously unseen items, as shown in Equation (2.9). The items with the highest utility score can then be presented to the user. The rank k of xu and y_i, along with λ and α are all

hyper-parameters which needs to be determined with grid search on training data.

r(u, i) =xTuyi (2.9)

One of the main cited drawbacks of collaborative filtering is the so called cold-start prob-lem [3]. If either a user or an item does not have any feedback, the collaborative filtering algorithm has a hard time recommending content for the user, or using the item in any rec-ommendations.

2.3 Incorporating context

A context-aware recommender system takes the user’s context into account when providing recommendations. The idea is that this should make for more nuanced recommendations that better fit the user’s current situation. One can imagine that a user’s choice of music can be different when a user is at home or at work, in the morning or evening, when listening with friends or with family. Most research adheres to the representational view of context [16]: one constructs a context from factors that can be recorded as the observed event happens. For example, in a proposed movie recommender the context consisted of the current time, location and companion [17]. In the tourist recommendations domain a proposed point-of-interest recommender used budget, weather and season as context [18]. What information to include in a context is dependent on the domain in which it is intended to be used and there does not seem to exist any obvious way of doing it generally.

Three paradigms for incorporation of contextual knowledge into a recommender system have been proposed [17]: contextual pre-filtering, contextual post-filtering and contextual model-ing.

Contextual pre- and post-filtering assume that the data used is contextualized already. Pre-filtering essentially limits and selects which items are relevant for recommendation, based on the user’s context. Then this subset of items is used to make recommendations. Contextual post-filtering uses the whole dataset to make recommendations, but instead fil-ters out recommended items which do not fit into the user’s current context. The benefit of these approaches is that one can extend already existing systems, that in themselves do not take context into account.

Contextual modeling instead incorporates contextual information directly into prediction of ratings. This requires extension of the already existing techniques discussed in Section 2.2. Instead of only considering users and items at prediction time, one also has to incorporate context. A multidimensional model has been proposed, which is shown to outperform clas-sical neighborhood-based collaborative filtering recommenders when the chosen context is relevant to the recommendation task at hand [17]. However, the multidimensional approach is still based on neighborhood approaches. By fixing the contextual parameter the rating pre-diction problem is reduced down to two dimensions so that any existing neighborhood-based rating function can be used; i.e. any function that conform to Equation (2.5). This leads to rec-ommendations that are not influenced by any ratings that have been observed outside of the fixed context parameter.

Matrix factorization techniques, similar to the model presented in Section 2.2, have been presented that utilize e.g. time as a parameter to the error function [9]. A direct extension of Equation (2.6) has been proposed that incorporates one-class feedback, sourced from users (hash)tagging tracks in social media: the WTMF-model (Weighted Tags Matrix Factoriza-tion) [5]. The idea is that these tags semantically describe the content. This is interesting

(13)

2.3. Incorporating context

because it provides a quite general way of relating not only users and items to each other in latent space, but also allows for the tags to affect the user and item representations by co-occurrence. A user might be particularly interested in items tagged “running”, and some items might inherently suit that same tag better than others.

Consider a system that is comprised of a set of tags T as defined in Equation (2.10). In a similar fashion to how the user-item matrix S was constructed, one can now define a user-tag matrix H and a item-tag matrix V , as shown in Equation (2.11) and Equation (2.12), where each column correspond to a tag in T and the rows correspond to either a user or an item.

T=tt : 0, 1, . . . , pu (2.10) H=    h0,0 . . . h0,p .. . . .. ... hm,0 . . . hm,p    , hu,tPN (2.11) V =    v0,0 . . . v0,p .. . . .. ... vn,0 . . . vn,p    , vi,tPN (2.12)

The observations from the user-tag and item-tag matrix can now be incorporated into an error function. In Equation (2.13) two new error terms have been added — for reconstructing H and V respectively [5]. The regularizing term has also been extended with the new latent factor vector zt, representing tag t in latent space. Recommendations are still generated by

the dot product of xu and y_i and presenting the ones with highest utility for the user, but

these vectors have now been influenced by ztduring training, which has shown to alleviate

cold-start problems in the recommendations [19]. The rank k of the latent factor vectors, along with µ1, µ2, α, β, γ and λ are all hyper-parameters which needs to be determined with grid

search on training data.

min @xu,yi ÿ u,i c(α, su,i) p(su,i)´xTuyi 2 +µ1 ÿ u,t c(β, hu,t) p(hu,t)´xuTzt 2 +µ2 ÿ i,t c(γ, vi,t) p(vi,t)´yiTzt 2 +λ ÿ u }xu}2+ ÿ i }y_i}2+ÿ t }zt}2 ! (2.13)

As stated earlier, this error function does not only relate user and items against each other, but also users with tags, and items with tags. In this manner one can incorporate any one-class feedback into a recommender. In a frame of context-aware recommender systems this one-class feedback can describe e.g. the time of day, day of week or any other context pa-rameter in which an items has been consumed. One does not need users to generate tags for Equation (2.10), but rather one can log contextual factors when users consume items, and use that as tags in the system.

One drawback of this re-purposing of the WTMF-model is that there is no straightfor-ward way to supply a tag/context at recommendation-time, which can be a desired feature of a context-ware recommender (e.g. only recommend tracks for the running context of a particular user). This can be remediated in different ways, depending on how one wants to deploy the recommender within an application. Therefore, this fact is brought up again as future work (in Chapter 6).

(14)

2.4. Evaluating recommender systems

2.4 Evaluating recommender systems

The most realistic evaluation of recommenders is done through online tests, where users of a system perform real tasks that they would normally do [20]. By employing e.g. A/B-tests one can redirect a small percentage of users to different recommendation algorithms and subsequently measure the impact on relevant metrics. However, such tests require a fully deployed recommendation system, which is out of scope for this thesis.

Through offline tests on recorded datasets one can test certain properties of a recom-mender that, hopefully, translates to user satisfaction. The accuracy of a recomrecom-mender is a common metric to measure [6, 20], such as root-mean-squared error (RMSE). However, this measure is not entirely appropriate for implicit feedback datasets as one does not have ac-cess to the user’s actual preference (only signals that acts as proxies). Recall-based metrics have therefore been employed instead, such as the expected percentile rank [5, 7]. Let ranku,i

denote the percentile rank in which item i has been placed for user u, in an ordered list of recommended items. The order is given by the learned utility score from the recommender. The value will be close to 0% if the item is in the top of the list, and closer to 100% if the items is close the end of the list. The expected percentile rank of a recommender is then calculated as shown in Equation (2.14). Note that if one were to randomly order items the expected percentile rank would approach 50%.

rank=

ř

u,isu,iranku,i

ř

u,isu,i

(2.14) This metric penalizes giving important items, given by the stream count su,i, a high rank

(i.e. putting the item far down on the recommendations list), while non-important items do not affect the metric as much. This is a better fit of the nature of implicit feedback, as one only have positive examples and cannot reason so much about items with low or no feedback.

2.5 Related work

Recommender systems is a highly active research area, with contributions from both academia and industry. The application domains for recommender systems are surprisingly many, which leads to niches in various domains. As mentioned earlier, implicit ratings suit the music domain well, while other domains rely more heavily on explicit ratings. Music rec-ommenders and context-aware approaches seem to find good symbiosis also, possibly due to the flexible consumption patterns of music which affects the listeners content needs [4].

Numerous papers have proposed to incorporate a temporal context. One approach di-vides user’s streaming data into “micro-profiles” for various time segments (time of day, seasons, etc.), in order to nuance artist recommendations [21]. Another recommender from the movie domain tries to model temporal factors as “concept drift” [9], to capture long and short-term trends from the user’s rating data. This approach has also been extended to not only encompass temporal factors [22]. Basing recommendations on sessions created from user’s listening history is yet another way [23].

Published works also explore user’s activities as context. The usefulness of mobile phones for recording daily activities and incorporating them into recommenders have been investi-gated in several works, using probabilistic [24] and neighborhood-based [25] recommenders. InCarMusic is a recommender entirely purposed towards music listening while in a car, to-gether with other passengers [18]. This work also explored the concept of context relevance, i.e. trying to understand which contextual factors matter for the situation and recommen-dation task at hand, which is often overlooked in works relating to context-aware recom-menders.

Contextual knowledge of music can also be gained from social media, through the process of users categorizing and tagging of music for their own listening needs. This has e.g. been

(15)

2.5. Related work

explored in the setting of predicting the next track to play based on recent listening, using latent Dirichlet allocation (LDA) to create a topic modeling and subsequently mining for fre-quent sequence transitions of those topics in order to predict the next track [14]. Informa-tion about the user’s locaInforma-tion has also been extracted from social media and incorporated in neighborhood-based artist recommendations [15].

(16)

3 Method

Within the work of this thesis a recommender model that incorporated social-media tags was re-purposed for context-aware recommendations. This model was used to train several recommenders on data with a varying set of contextual factors. The recommenders were evaluated and compared to each other to see how they fared on the the different levels of contextual detail that was used in the training data. A baseline recommender was also trained that had no notion of context.

This chapter will cover how this work was carried out. Section 3.1 describes the dataset that was used. How the context was defined is described in Section 3.2. This definition is later used to create a contextualized music log in Section 3.3. The recommender models and how they were trained is detailed in Section 3.4. Figure 3.1 illustrate the data processing steps of this method.

3.1 The dataset

The dataset used in this thesis is comprised of two logs, that have been collected from con-senting Spotify employee’s mobile phones during a two-year-period. These logs, referred

(17)

3.1. The dataset

to as the activity and music logs and are shown in Table 3.1, are presented in detail in the following sections. The logs were recorded independently from each other but still part of the regular logging routines in the Spotify app for Android and iPhone. Due to the sensitive nature of these logs they cannot be released publicly.

Table 3.1: An overview of the logs used. Name Entries (millions) Users

Activity log 12.0+ 500+

Music log 1.3+ 500+

Activity log

Table 3.2: Schema of the activity log.

Field Type Description

user_id string Unique identifier for the user device_time double Time of event on the phone in UTC UTC_offset integer Time zone offset from UTC in seconds

stationary boolean True if the user was stationary walking boolean True if the user was walking running boolean True if the user was running cycling boolean True if the user was cycling automotive boolean True if the user was in a moving vehicle

The activity log contains information about the user’s physical activity as can be detected through the sensors of their mobile phone, e.g. was the user walking, running or traveling in a vehicle. The user’s local time offset from UTC was also recorded. This log served as the contextual information that would be used in the recommender. See Table 3.2 for details on each field of the log. As shown in Table 3.1, the activity log is comprised of 12.0+ million entries from 500+ users.

Android phones used the Activity Recognition API [26] to collect this data, while iPhones used the CMMotion Activity Manager [27], both of which work in a very similar manner. The only notable difference between these frameworks for this thesis is that the iPhone variant could not detect cycling as an activity. Both frameworks let you set a callback that will be called on a regular interval, with the phone’s interpretation of the owner’s current activity. It is possible that the phone cannot detect or determine any of the possible activities, in which all of the activity fields would be false. Also, only one of the activities can be true at one time. This means that even though you are stationary in a moving car, it is only reported as automotive.

Activity logging was enabled when a user started listening to music. The phone would log activity once every minute. This rate was not guaranteed though, as the phone might need more time to determine what activity was currently taking place. It is possible that a reading came earlier than desired as well, if some other app on the device already had requested an update [26].

Music log

The music log contains an entry for each track that the user has played for at least 30 seconds and at what time that happened. This can also be referred to as the user’s streams. This log was the basis of the implicit feedback used in the recommender. See Table 3.3 for details on each field of the log. The music log is comprised of 1.3+ million entries from the same 500+

(18)

3.2. Context definition

Table 3.3: Schema of the music log.

Field Type Description

user_id string Unique identifier for the user time double UTC time when track stopped playing track_id string Unique identifier of the track that was played

users that make up the activity log, as shown in Table 3.1. The entries in turn refer to 355,000+ unique tracks that were played.

3.2 Context definition

In this thesis the context is modeled in accordance with the representational view of context. The information from the activity log was used to define two contextual factors: ToD (Time of Day) and activity.

The time of day parameter was formed from the device_time and UTC_offset fields of the ac-tivity log. The users were in different time zones, but this was compensated by the UTC_offset value. This offset always reflected the local time offset from UTC as configured in the phone, at the time the log message was created. By adding device_time and UTC_offset together the local time of the user was recreated. The local time was then categorized into discrete values as shown in Table 3.4. This categorization leads to the definition of the time of day parameter seen in Equation (3.1).

Table 3.4: Time of day definition. Start time End time ToD value

05:00 09:59 morning

10:00 13:59 day

14:00 16:59 afternoon

17:00 21:59 evening

22:00 04:59 night

ToD P tmorning, day, afternoon, evening, nightu (3.1) The activity parameter, defined by Equation (3.2), was formed from the stationary, walking, running, cycling and automotive fields in the activity log, assuming the name of the field which was currently true.

activity P tstationary, walking, running, cycling, automotiveu (3.2) Equation (3.1) and Equation (3.2) let us define three different constellations of context, seen in Equation (3.3). One could use either ToD or activity as context in isolation, or combine them. Ci = $ ’ & ’ % tToDu, i=0 tactivityu, i=1 tToD, activityu, i=2 (3.3)

3.3 Sessionization and context attribution

In order to relate the context that a user was in and the music they listened to at that moment, the activity log was divided into sessions (durations of time) where the context stayed

(19)

con-3.3. Sessionization and context attribution

Figure 3.2: Sessionization example.

stant, i.e. sessionized. This reveals sessions where the user was e.g. walking and listening to music. Subsequently, the music that was listened to during that session could be tagged with the recorded context. Figure 3.2 tries to illustrates this process.

Sessionizing was done by chronologically grouping consecutive activity log entries that lie within a set time of each other, where the context is constant. A start and end time for the session was extracted from the first and last log entry of the session. If the context changed or no more entries could be found within the time out, the current session would be ended and a new session would start from the next entry in the log. 15 minutes was used for the session time out in this thesis.

Temporary changes of the context value were also taken into account when sessionizing. For example, if a user is walking to work (only considering activity as the context) and stops at an intersection, this will be logged as activity = walkingup until the intersection. The logs would then show activity = stationarywhile the user is waiting to cross the street, and then continue to show activity=walkingas the user continues to walk to work. To see this as one walking session sessionizing must make sure to debounce these occurrences in the logs. In this thesis debouncing was implemented by looking ahead 15 minutes from when the context diverges, to see if the context changes back. If it does, then sessionizing continues from there. If the context does not change back, the current session is ended and a new one started, with the new context value.

The outcome is a contextualized music log for each user’s listening history. Table 3.5 shows the schema of these resulting three datasets, as one was produced for each context constellation Ci. Each entry corresponds to a stream of one track. From this log one can

conclude what tracks were played, how many times the tracks were played and in which context, per user. The start and stop time of the sessions was then used to look at the music log and attribute the tracks that were listened to during that session with the corresponding context. For the special case when a session has the same start and end time, this attribution process would look five minutes back and ahead for any music log events. The result of the context attribution is presented in Section 4.1.

Table 3.5: Schema of the contextualized music log. Field Type Description

user_id string Unique identifier of the user track_id string Unique identifier of the track

(20)

3.4. Recommender model

3.4 Recommender model

The WTMF-model, presented in Section 2.3 with its corresponding error function shown in Equation (2.13), was used to generate recommendations based on the contextualized music logs. The model was trained in the same manner as in the original paper [5]: the number of latent factors was fixed to 10, while training iterations were varied between 5, 10, 20, 50 and 100, and vice versa, as it was not feasible to freely vary both due to long training times. The hyper-parameters α, β, γ, µ1, µ2 and λ were then determined by grid search for each

factor-iteration combination. The model was trained with a stratified 5-fold cross-validation, with approximately 80% of the data in the training set and 20% in the evaluation set. Expected percentile rank was used as the evaluation metric (see Section 2.4). The ranku,iis derived from

the model’s recommendations based on the training set, while the stream count su,i is read

from the evaluation set. The model configuration that yielded the lowest expected percentile rank is used in the results, which are shown in Section 4.2. This training and evaluation process was repeated for each contextualized music log, in order to reveal if the different context constellations, Ci, would yield different results.

The MF-model, presented in Section 2.2 with its corresponding error function shown in Equation (2.6), was used as baseline. This model also underwent the same parameter selec-tion and evaluaselec-tion process as stated above. However, it was trained directly on the music log, as it does not have any notion of context.

Open source Python implementations of both the WTMF-model1 and MF-model2were used. Cross-validation logic made use of the Scikit-learn library [28].

1_{https://github.com/andreuvall/WeightedTags-MF} 2_{https://github.com/benfred/implicit}

(21)

4 Results

This chapter lists the results of the context attribution process in Section 4.1, and the model evaluation results of the recommenders that were trained in Section 4.2.

4.1 Context attribution

The sessionization and context attribution process managed to map 33% of the streams in the music log to a context when using context constellation C1(using activity) and C2(using ToD

and activity), while 34% were mapped using context constellation C0(using ToD). Figure 4.1

shows how the streams that could be attributed to a context were distributed over the possible context values, for the three constellations. Each x-axis shows the context values within each constellation. The y-axis reveal how many streams occurred within each context.

4.2 Recommender model evaluation

Figure 4.2 shows the evaluation outcome of the two recommender models tested. The x-axis has one entry for the MF-model and three entries for the WTMF-model, where Ci indicates

which context constellation was considered creating the contextualized music log which the model was trained on. The y-axis shows the expected percentile rank each model produced. A lower value is better, as it translates to the important items, on average, being placed higher up in the recommendations list.

The WTMF-model outperformed the MF-model in terms of the expected percentile rank metric, regardless of what context constellation was used to train the WTMF-model. The MF-model yielded an expected percentile rank of 14.16% on this dataset. The WTMF-model trained on the contextualized music log using context constellation C2, which combined both

the time of day and activity factors, had the best performance: an expected percentile rank of 9.58%. The WTMF-model trained with context constellation C1(activity as context) yielded

an expected percentile rank of 11.19%. This was somewhat better than using context constel-lation C0(time of day as context) with the WTMF-model, which in turn yielded an expected

percentile rank of 12.07% — worst of the WTMF-model runs.

With these improvements in expected percentile rank and along with the metric’s defini-tion one can conclude that the approach taken in this thesis did improve recommendadefini-tion recall compared to the baseline MF-model.

(22)

4.2. Recommender model evaluation

(a) Using ToD as context.

(b) Using activity as context.

(23)

4.2. Recommender model evaluation

(24)

5 Discussion

This chapter reasons around the results in Section 5.1. The method is critiqued in Section 5.2. Section 5.3 briefly discusses the societal impact and legal concerns of recommender systems.

5.1 Results

The results from the context attribution (Section 4.1) and model evaluation (Section 4.2) are analyzed separately in the corresponding subsection below.

Context attribution

The difference in share of captured streams between constellations C1, C2and C0 is due to

the fact that when activity was used as a context, there were log entries that had all possible activities reported as false (and no meaningful session could be created from those entries). Time of day was always reported, however, which meant that activity became the limiting factor.

The seemingly low numbers of streams that could be attributed to a context can to some extent be explained by the fact that the music log consisted of streams from any client. The ac-tivity log was however only populated when a user was streaming from their iOS or Android device. Therefore, depending on the user’s likeliness to listen on their iOS and/or Android device, compared to any other client available to them (e.g. the desktop client), sets the bar for how many sessions that could be captured within the activity log, given the setup that existed in this work. A more comprehensive logging mechanism that works across devices would likely be able to attribute more of the user listening to its context. Nevertheless, streams that are not attributed to their context are still useful to the WTMF-model, as those streams are still used as implicit feedback to the model, revealing what preferences the user has. This can be contrasted to the approach of micro-profiling users employed in other work [21], which only used in-context ratings.

Figure 4.1 illustrates how streams were captured and attributed across the possible context values, for each Ci. Figure 4.1a reveals that listening is spread throughout the day, with a quite

strong emphasis during the evening (between 17:00 and 21:59). The streams in Figure 4.1b are a lot more skewed, with the strongest emphasis on the stationary context. None of these results are perhaps too surprising, considering that everyone in this population had an

(25)

office-5.1. Results

type job. Naturally, most time of a user’s day is spent stationary at the office, with some commuting back and forth. Evenings are likely where one would have the most personal time to listen to music unhindered, while it is evident that this was possible to during the other parts of the day also. Streams during the night can possibly be explained by late-night partying or users who have a habit of listening to music when they are trying to sleep.

Note also how sparse the data became in some cases when both activity and time of day was used as contextual factors in Figure 4.1c. All contexts did contain at least 100 streams, but e.g. stream counts in cycling and running contexts are meager in comparison to most other contexts. These contexts with few streams are likely to be skewed towards the music that the few users who contributed to those contexts like. Evidently, even in this fairly large-scale dataset, data points within these more rare occurrences in the end-user’s day remain elusive. As an interesting side-note: running seems to be a inherent morning activity in this popu-lation. Morning runs was the most prevalent type of run, while mornings were not as promi-nently featured for any other activity. Evenings was the most likely time of day in which other activities occurred (which was the time of day where most streaming occurred overall).

Recommender model evaluation

The WTMF-model was developed with one-class feedback in the form of social media tags in mind [5], which most likely would yield sparse user-tag and item-tag matrices. In this application where the tag values are a small set of predefined contexts, the cardinality of the tags are much lower than one would expect from social media. This also results in the user-tag and item-tag matrices being considerably less sparse. For example, when looking at the matrices constructed from the contextualized music log, with context constellation C2,

the sparsity for the user-tag matrix is 27.5% (i.e. not sparse at all) and 86.7% for the item-tag matrix. This can be contrasted with the user-item matrix which was 99.6% sparse. It has already been shown that the WTMF-model alleviates cold-start problems through the tags that provide additional information when usage of an item can be low, which contributes to its success [19]. Denser user-tag and item-tag matrices should to some extent simplify the cold-start problem for the recommender. Other recommenders that incorporate context have also shown to be beneficial in alleviating the cold-start problem [14, 24]. And while the WTMF-model maintains a fairly simplistic error function, it still relates users, items and context within the same latent space simultaneously, as opposed to other proposed context-aware matrix factorization approaches [22].

It is also worth noting that the recommender seem to perform better on more specific context definitions over more generic ones. Only contextualizing with time of day yielded the smallest improvement, but also the most dense user-tag and item-tag matrices. It seems likely that a lot of different kinds of music can be played during an afternoon or evening. These contexts might simply be conceptually too broad to capture a more specific subset of music to fit the occasion. The same can be said for a stationary activity. However, combining the contextual factors can reveal more nuance: an evening walk can be on the way to a party, while a morning walk is more likely to be a calm endeavor — two similar situations that calls for different kind of music. If this is an effect of the particular context constellations used in this thesis or if the patterns holds up more generally is a question that needs more research until it can be answered.

An interesting observation is also that while the share of streams that could be attributed to a context was seemingly low, it still lead to a 4.58 percentage points improvement (in the best case) on expected percentile rank. Further research will have to reveal if this can be improved upon by simply having a greater coverage of context attributed streams. Although, clients usually have different hardware and APIs which one can work with, so full coverage might be hard to reach practically.

(26)

5.2. Method

5.2 Method

There are a few deficiencies worth noting in the method applied in this thesis. Let us consider the dataset: it is made up of a fairly homogeneous population of tech-savvy users that all work in office environment. These users know each other to a greater extent (compared to a general population), might listen to similar music and have similar world views. This is likely to introduces biases in the dataset and therefore the proposed approach might not yield the same results on a more heterogeneous population of end-users.

The context definition and sessionization contain a handful of hard definitions which de-creased the complexity of this work, but instead might manifest themselves as noise in the evaluation result. The timeout and lookahead parameter of 15 minutes, along with the 5 minute padding in special cases, of the sessionization process (see Section 3.3) was chosen on a combination of intuition as well as trial and error. Ideally, this parameter should have been included in the grid search when deciding the rest of the experiment parameters, but due to time limitations this was not feasible.

The definition of time of day, presented in Table 3.4, is a subjective classification of the parts of a day by the author. This classification might not fit all lifestyles. For example, people working night shifts can have a daily routine that does not fit well into these time spans. The relevant time spans of a user’s day could likely be learned from their listening patterns, but this introduces complexities when trying to compare across several users (i.e. how would one formulate the tag used in the WTMF-model).

Finally, a crucial simplification to note in this work is the offline evaluation. An improved expected percentile rank should encourage further exploration in the direction of the work done in this thesis. However, it is easy to jump to the conclusion that improved recall yields better recommendations, which does not have to be true [20]. As stated in Section 2.4, recall is but one property of a recommender system. This in turn becomes a question of validity of the study: the end-goal is to serve better recommendations, but the chosen evaluation method does not fully provide this proof. A more elaborate evaluation method, such as A/B-testing, can be employed to gain this insight [20]. As this work practically revolved around data processing in code, replicability and reliability of the study should not be an issue, if one were to redo the experiment with another evaluation method. The one caveat is however gaining access to a similar dataset in size and content, as these can be rare to find in academia.

5.3 The work in a wider context

Ubiquitous use of personal data and personalization techniques, which includes recom-mender systems, on the web has sparked a public discussion on the possible harms that such systems can incur in a society [29]. The possibility of fracturing users into smaller isolated vir-tual communities was already discussed at the inception of collaborative filtering [30], which in present day has been associated with the term “filter bubbles” [31].

One study that tried to model this phenomenon concluded that is possible that users who do not seek out opposing points of view can reinforce their own biases, making them less likely to trust important decisions to those whose do not share the same points of view [32]. However, more recent research that used data from the music industry concluded that rec-ommender systems seem to have a homogenizing effect (instead of a polarizing effect), as the recommender shifted consumption to music that users had in common, combined with users consuming more music overall [33]. This claim is also supported in a study carried out on a collaborative filtering recommender in the movie domain [34]. It is not clear if these findings translate to other domains, such as news and social media. The author notes that there does not seem to be scientific consensus on this issue yet. It is evident that industry and academia needs to better understand the side-effects and the societal impacts of recommender system deployments.

(27)

5.3. The work in a wider context

Recommender systems is also inherently bound to data collection of individual users. Personal data is classified as sensitive within legal frameworks in e.g. the European Union, and must therefore be collected and handled accordingly. Special care needs to be taken when handling contextual data, as this data can sometimes make it easier to compromise anonymity of users (e.g. when using GPS-coordinates).

(28)

6 Conclusion

This thesis posed the research question: How is recommendation quality affected by incorporating contextual factors, based on time and activity, in track recommendation using collaborative filtering with implicit feedback?

It is shown in this thesis that the recall of a recommender can be improved by incorporat-ing time and activity as contextual factors, givincorporat-ing a partial answer to the research question. In fact, in the approach taken in this thesis the context-aware recommender always outper-formed the baseline model, regardless of which composition of time and activity was used to model the context. Figure 4.2 shows that using both factors in conjunction yielded the most improvement in recall.

There are more nuances to a recommender’s quality than solely recall, however. The pos-itive offline evaluation of recall in this thesis should entice further research with the WTMF-model. This research needs to explore if the recommendations produced by this model are useful, novel and maybe most important of all: perceived as fitting for the context by each user.

The methodological issues presented in Section 5.2 are clear candidates for future work. Namely, finding better ways of deciding or learning the parameters used in the sessionization and context attribution. Testing the model on a dataset comprised of a more heterogeneous population would also be interesting. Also, as mentioned in Section 5.1, a more elaborate logging setup that enables better attribution of context for the streams should have beneficial effects.

There is more to learn about the qualities of the WTMF-model’s recommendations. Follow-up research can investigate the novelty, diversity and/or serendipity of the produced recommendations [20]. Do actual users want to consume the recommendations? Do the rec-ommended tracks fit the context they are tagged with, as perceived by the user? The results of this thesis seem to indicate that a more specific context yields better recommendations. To which extent is this true? Is it not beneficial to also add other temporal contexts, such as seasons, simultaneously?

One must also consider how one would surface the recommendations to the user, i.e. research the possible interactions models. There is no straightforward way of supplying con-text to the WTMF-model at recommendation-time. Therefore, if the model is not altered, one would have to implement a post-filtering of the recommendations if one would like to show recommendations only related to e.g. the running context. What implication does this have

(29)

on the result quality? Alternatively, one can simply surface a top-N list of recommendations and which contexts/tags the item is associated with. The user can then when exploring new music listen decide if this track is something to save and take action accordingly (e.g. add to the playlist they listen to while running). There are surely more interaction modes to explore, as well.

(30)

Bibliography

[1] F. Ricci, L. Rokach, and B. Shapira, “Recommender systems: introduction and chal-lenges”, in Recommender Systems Handbook, F. Ricci, B. Shapira, and L. Rokach, Eds., 2nd ed., Boston, MA: Springer US, 2015, ch. 1, pp. 1–34.DOI: 10.1007/978-1-4899-7637-6_1.

[2] M. de Gemmis, P. Lops, C. Musto, F. Narducci, and G. Semeraro, “Semantics-aware content-based recommender systems”, in Recommender Systems Handbook, F. Ricci, B. Shapira, and L. Rokach, Eds., 2nd ed., Boston, MA: Springer US, 2015, ch. 4, pp. 119– 159.DOI: 10.1007/978-1-4899-7637-6_4.

[3] M. D. Ekstrand, “Collaborative filtering recommender systems”, Foundations and Trends® in Human–Computer Interaction, vol. 4, no. 2, pp. 81–173, Feb. 2010.DOI: 10.

1561/1100000009.

[4] M. Schedl, P. Knees, B. McFee, D. Bogdanov, and M. Kaminskas, “Music recommender systems”, in Recommender Systems Handbook, F. Ricci, B. Shapira, and L. Rokach, Eds., 2nd ed., Boston, MA: Springer US, 2015, ch. 13, pp. 453–492.DOI: 10.1007/978-1-4899-7637-6_13.

[5] A. Vall, M. Skowron, P. Knees, and M. Schedl, “Improving music recommendations with a weighted factorization of the tagging activity”, in ISMIR 2015: Proceedings of the 16th International Society for Music Information Retrieval Conference, 2015, pp. 65–71. [Online]. Available: http://ismir2015.uma.es/articles/129_Paper.pdf. [6] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender

systems”, Computer, vol. 42, no. 8, pp. 30–37, Aug. 2009.DOI: 10.1109/MC.2009.263. [7] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for implicit feedback datasets”, in 2008 Eighth IEEE International Conference on Data Mining, IEEE, Dec. 2008, pp. 263–272.DOI: 10.1109/ICDM.2008.22.

[8] Y. Koren and R. Bell, “Advances in collaborative filtering”, in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, Eds., 2nd ed., Boston, MA: Springer US, 2015, ch. 3, pp. 77–118.DOI: 10.1007/978-1-4899-7637-6_3.

[9] Y. Koren, “Collaborative filtering with temporal dynamics”, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, New York, USA: ACM Press, 2009, pp. 447–455.DOI: 10.1145/1557019.1557072.

(31)

Bibliography

[10] X. Ning, C. Desrosiers, and G. Karypis, “A comprehensive survey of neighborhood-based recommendation methods”, in Recommender Systems Handbook, F. Ricci, B. Shapira, and L. Rokach, Eds., 2nd ed., Boston, MA: Springer US, 2015, ch. 2, pp. 37– 76.DOI: 10.1007/978-1-4899-7637-6_2.

[11] S. Funk, Netflix update: try this at home, 2006. [Online]. Available: https://sifter. org/~simon/journal/20061211.html(visited on 05/03/2020).

[12] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender sys-tems: a survey of the state-of-the-art and possible extensions”, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734–749, Jun. 2005.DOI: 10.1109/ TKDE.2005.99.

[13] L. Baltrunas, M. Kaminskas, B. Ludwig, O. Moling, F. Ricci, A. Aydin, K. H. Lüke, and R. Schwaiger, “Incarmusic: context-aware music recommendations in a car”, Lecture Notes in Business Information Processing, vol. 85 LNBIP, pp. 89–100, 2011.DOI: 10.1007/978-3-642-23014-1_8.

[14] N. Hariri, B. Mobasher, and R. Burke, “Context-aware music recommendation based on latent topic sequential patterns”, in RecSys’12 - Proceedings of the 6th ACM Conference on Recommender Systems, New York, New York, USA: ACM Press, 2012, pp. 131–138.

DOI: 10.1145/2365952.2365979.

[15] M. Schedl, A. Vall, and K. Farrahi, “User geospatial context for music recommendation in microblogs”, in SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Confer-ence on Research and Development in Information Retrieval, New York, New York, USA: Association for Computing Machinery, 2014, pp. 987–990.DOI: 10.1145/2600428.

2609491.

[16] P. Dourish, “What we talk about when we talk about context”, Personal and Ubiquitous Computing, vol. 8, no. 1, pp. 19–30, Feb. 2004.DOI: 10.1007/s00779-003-0253-8. [17] G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin, “Incorporating

contex-tual information in recommender systems using a multidimensional approach”, ACM Transactions on Information Systems, vol. 23, no. 1, pp. 103–145, Jan. 2005.DOI: 10.1145/

1055709.1055714.

[18] L. Baltrunas, B. Ludwig, and F. Ricci, “Context relevance assessment for recommender systems”, in Proceedings of the 16th International Conference on Intelligent User Interfaces, ser. IUI ’11, New York, NY, USA: ACM, 2011, pp. 287–290.DOI: 10.1145/1943403. 1943447.

[19] A. Vall, “Listener-inspired automated music playlist generation hybrid collaborative filtering for implicit”, RecSys ’15: Proceedings of the 9th ACM Conference on Recommender Systems, pp. 387–390, 2015.DOI: 10.1145/2792838.2796548.

[20] A. Gunawardana and G. Shani, “Evaluating recommender systems”, in Recommender Systems Handbook, F. Ricci, B. Shapira, and L. Rokach, Eds., 2nd ed., Boston, MA: Springer US, 2015, ch. 8, pp. 265–308.DOI: 10.1007/978-1-4899-7637-6_8.

[21] L. Baltrunas and X. Amatriain, “Towards time-dependant recommendation based on implicit feedback”, in Workshop on context-aware recommender systems (CARS 2009), New York, NY, USA, 2009.

[22] L. Baltrunas, B. Ludwig, and F. Ricci, “Matrix factorization techniques for context aware recommendation”, in RecSys’11 - Proceedings of the 5th ACM Conference on Recommender Systems, New York, New York, USA: ACM Press, 2011, pp. 301–304.DOI: 10.1145/

(32)

Bibliography

[23] R. Dias and M. J. Fonseca, “Improving music recommendation in session-based collab-orative filtering by using temporal context”, in Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, 2013, pp. 783–788.DOI: 10.1109/ICTAI.2013. 120.

[24] X. Wang, D. Rosenblum, and Y. Wang, “Context-aware mobile music recommendation for daily activities”, in MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia, New York, New York, USA: ACM Press, 2012, pp. 99–108.DOI: 10.1145/ 2393347.2393368.

[25] T. De Pessemier, S. Dooms, and L. Martens, “Context-aware recommendations through context and activity recognition in a mobile environment”, Multimedia Tools and Appli-cations, vol. 72, no. 3, pp. 2925–2948, 2014.DOI: 10.1007/s11042-013-1582-x.

[26] Google, Activityrecognitionapi. [Online]. Available: https://developers.google. com / android / reference / com / google / android / gms / location / ActivityRecognitionApi(visited on 05/03/2020).

[27] Apple, Cmmotionactivitymanager. [Online]. Available: https://developer.apple. com / reference / coremotion / cmmotionactivitymanager # /apple _ ref / occ/cl/CMMotionActivityManager(visited on 05/03/2020).

[28] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: machine learning in python”, Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[29] C. Cadwalladr and E. Graham-Harrison, Revealed: 50 million facebook profiles harvested for cambridge analytica in major data breach, Mar. 2018. [Online]. Available: https://www. theguardian.com/news/2018/mar/17/cambridge- analytica- facebook-influence-us-election.

[30] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grouplens: an open architecture for collaborative filtering of netnews”, in Proceedings of the 1994 ACM con-ference on Computer supported cooperative work - CSCW ’94, New York, New York, USA: ACM Press, 1994, pp. 175–186.DOI: 10.1145/192844.192905.

[31] E. Pariser, Beware online "filter bubbles", 2011. [Online]. Available: https : / / www . ted . com / talks / eli _ pariser _ beware _ online _ filter _ bubbles(visited on 05/03/2020).

[32] M. Van Alstyne and E. Brynjolfsson, “Global village or cyber-balkans? modeling and measuring the integration of electronic communities”, Management Science, vol. 51, no. 6, pp. 851–868, Jun. 2005.DOI: 10.1287/mnsc.1050.0363.

[33] K. Hosanagar, D. Fleder, D. Lee, and A. Buja, “Will the global village fracture into tribes recommender systems and their effects on consumer fragmentation”, Management Sci-ence, vol. 60, no. 4, pp. 805–823, Nov. 2014.DOI: 10.1287/mnsc.2013.1808.

[34] T. T. Nguyen, P. M. Hui, F. M. Harper, L. Terveen, and J. A. Konstan, “Exploring the filter bubble: the effect of using recommender systems on content diversity”, in WWW 2014 - Proceedings of the 23rd International Conference on World Wide Web, New York, New York, USA: Association for Computing Machinery, Inc, Apr. 2014, pp. 677–686.DOI: 10.1145/2566486.2568012.

Contextualizing music recommendations : A collaborative filtering approach using matrix factorization and implicit ratings

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Datateknik

2020 | LIU-IDA/LITH-EX-A--20/043--SE

Contextualizing

music recommendations

A collaborative ﬁltering approach using matrix factorization

and implicit ratings

Kontextualisering av musikrekommendationer

Alexander Häger

Upphovsrätt

Copyright

Contents

List of Figures

List of Tables

1

Introduction

1.1

Aim

1.2

Research question

1.3

Collaboration with Spotify

1.4

Disposition

2

Theory

2.1

Recommender systems

2.2

Collaborative filtering

2.3

Incorporating context

2.4

Evaluating recommender systems

2.5

Related work

3

Method

3.1

The dataset

Activity log

Music log

3.2

Context definition

3.3

Sessionization and context attribution

3.4

Recommender model

4

Results

4.1

Context attribution

4.2

Recommender model evaluation

5

Discussion

5.1

Results

Context attribution

Recommender model evaluation

5.2

Method

5.3

The work in a wider context

6

Conclusion

Bibliography