An Evaluation of the Indian Buffet Process as Part of a Recommendation System

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

,

STOCKHOLM SWEDEN 2018

An Evaluation of the Indian Buffet

Process as Part of a

Recommendation System

HELENA ALINDER

JOSEFIN NILSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

An Evaluation of the Indian

Buffet Process as Part of a

Recommendation System

HELENA ALINDER

JOSEFIN NILSSON

Bachelor Degree in Computer Science Date: June 3, 2018

Supervisor: Jens Lagergren Examiner: Örjan Ekeberg

Swedish title: En utvärdering av Indian Buffet Process som en del av ett rekommendationssystem

(3)

(4)

iii

Abstract

This report investigates if it is possible to use the Indian Buffet Pro-cess (IBP), a stochastic proPro-cess that defines a probability distribution, as part of a recommendation system. The report focuses on recom-mendation systems where one type of object, for instance movies, is recommended to another type of object, for instance users.

A concept of performing link prediction with IBP is presented, along with a method for performing inference. Three papers that are related to the subject are presented and their results are analyzed together with additional experiments on an implementation of the IBP.

The report arrives at the conclusion that it is possible to use IBP in a recommendation system when recommending one object to another. In order to use IBP priors in a recommendation system which include real-life datasets, the paper suggests the use of a coupled version of the IBP model and if possible perform inference with a parallel Gibbs sampling.

(5)

iv

Sammanfattning

Denna rapport undersöker om det är möjligt att använda Indian Buf-fet Process (IBP), en stokatisk process som definierar en sannolikhets-fördelning, som en del av ett rekommendationssystem. Rapporten fo-kuserar på rekommendationssystem där en sorts objekt, exempelvis filmer, rekommenderas till en annan sorts objekt, exempelvis använ-dare.

Ett sätt att förutse länkar, link prediction, mellan olika objekt med hjälp av IBP presenteras tillsammans med en metod för att dra sta-tistiska slutsatser, inference. Tre rapporter som är relaterade till ämnet presenteras och deras resultat analyseras tillsammans med ytterligare experiment på en implementation av IBP.

Rapporten drar slutsatsen att det är möjligt att använda IBP i ett re-kommendationssystem då systemet rekommenderar ett objekt till ett annat objekt. Rapporten föreslår en kopplad version av IBP för att kun-na använda IBP i ett rekommendationssystem som arbetar på riktigt data samt att inference ska utföras med en parallell Gibbs sampling.

(6)

Chapter 1 Introduction

The use of recommendation systems with the purpose of guiding be-haviour has been, and is still, rapidly increasing in marketing and the deliverance of products (Konstan & Riedl 2012). As a result of the ex-pansion of the Internet and web technologies, customers have a wide range of products to choose between and are constantly exposed to a large amount of information. It is therefore crucial for companies to present recommendations that fits the user’s needs and interests to prevent them from browsing too long and lose interest. There are different approaches on how to recommend products to users, for ex-ample it is possible to base recommendations on a user’s interest or on what products people have given a high rating. (Cao & Li 2007).

One method when recommending products to users is called Col-laborative filtering and it is based on the idea that people often takes other people’s opinion in account when making decisions (Ricci et al. 2011). Furthermore, link prediction is a concept of collaborative filter-ing and is used to predict possible future links or unobserved links in a network (Ghahramani 2017). Link prediction that is based on an object’s features can be divided into two approaches, class-based and feature-based (Miller et al. 2009). This paper presents a feature-based model that consists of a prior named the Indian Buffet Process.

The Indian Buffet Process (IBP) is a stochastic process which de-fines a probability distribution where a finite number of objects can be described with an infinite number of features. This is represented as a binary matrix where the objects are rows and the columns are fea-tures. The general concept of IBP can be explained by an example of customers in an indian buffet with what would seem like an infinite number of dishes. Since the customers does not know how many and

(9)

2 CHAPTER 1. INTRODUCTION

what kind of dishes there are, they base their choices on how popu-lar the dish is (Griffiths & Ghahramani 2011). This paper evaluates whether IBP is a suitable prior for a model in a recommendation sys-tem.

1.1 Purpose

With the growth of recommendation systems and the impact it has on society, it is interesting to analyze how they may be constructed. IBP is particularly interesting because of its capability to handle an infinite amount of latent features. This report will therefore present IBP and how to infer it with Gibbs sampling, and investigate if IBP and Gibbs sampling can manage the size of a real-life dataset and analyze their efficiency.

1.2 Problem Statement

This paper evaluates IBP and investigates if it is possible to use it as part of a model when recommending one type of object to another, for example movies to users. The research question is therefore:

Is it possible to use IBP as part of a model in a recommenda-tion system when handling two different types of objects?

1.3 Outline

The report is divided into six chapters. Chapter 1 Introduction, intro-duces the topic of the report as well as its purpose and problem state-ment. Chapter 2 Background, presents theoretical background impor-tant to discuss the problem statement as well as previous work related to the topic. Chapter 3 Method, describes the method that were used to conduct the report. Chapter 4 presents the results from the exper-iments described in chapter 3 along with results from related experi-ments. The following chapter 5 Discussion discusses the results and finally, an answer to the problem statement is presented in chapter 6 Conclusion.

(10)

Chapter 2 Background

2.1 Recommendation System

A recommendation system (RS) is a technique for recommending prod-ucts best suited for the user, where prodprod-ucts can include everything from movies to articles. The Internet today offers a wide range of in-formation and it can be difficult for a user to navigate and find the cor-rect information, thus it is crucial for a website that consists of large amount of data to have a well implemented RS (Ricci et al. 2011).

Collaborative filtering (CF) is a method used in many RS and is based upon the idea that a person often takes other people’s opinion in account when making daily decisions (Ricci et al. 2011). For example if a person watches three different movies and gives them a high rating, it is likely that another person that watches two of those movies, and rates them high as well, also would enjoy the third movie. CF can be divided into two techniques, memory based and model based.

2.1.1 Memory Based Technique

The memory based technique handles a matrix over objects (rows) and features (columns) and base recommendations on an objects earlier choices. This technique has many advantages since it is easy to under-stand and easy to implement. However if the matrix consists of large amount of data and the objects have few connections to the features, thus making the matrix sparse, operations on the matrix can become costly and the results unreliable (Pham et al. 2011).

(11)

4 CHAPTER 2. BACKGROUND

2.1.2 Model Based Technique

To solve the problems that arise when handling CF with large ma-trices, research has been done on model based clustering techniques. Clustering a matrix allows an implementation of CF to handle only the parts of the matrix that actually has connections and ignore the sparse parts. It has been shown that combining clustering with CF yields the same quality of recommendations as regular CF, while it also greatly improves the efficiency and performance of the RS on larger datasets (Pham et al. 2011).

2.2 Link Prediction

One way of doing Collaborative Filtering recommendations is by do-ing link predictions. The link prediction problem can abstractly be stated as: ’Given a snapshot of a network, can one predict the next most likely links to form in the network?’ (Ghahramani 2017). In other terms, given a set of links between pairs of objects in a network, predict the unobserved links (Miller et al. 2009).

Link prediction can either be used to predict future possible links in the network or to predict missing links in the case of incomplete data (Ghahramani 2017). Link prediction based on features of individual objects can be classified into two approaches: class-based and feature-based (Miller et al. 2009).

2.2.1 Class-based Approach

In the class-based approach, objects belongs to different classes that describes them and the probability of there being a link between two objects is determined solely on the classes of the objects (Miller et al. 2009). The classes are latent meaning that they are not known before-hand. These types of models, where each object is assumed to belong to one of a fixed number of latent classes and the objects are clustered based on their class is called a basic stochastic blockmodel (Williamson 2016b).

One significant problem with this approach is deciding how many latent classes these objects can be grouped into and how to group them (Miller et al. 2009). Revisiting the viewers and movies example, there could be the case of two classes being ’American sci-fi’ and ’American movie with hero figure’. These classes are quite similar, but with the

(12)

CHAPTER 2. BACKGROUND 5

class-based model, the options are either to merge the classes into one, losing valuable information, or duplicating common aspects of them (Miller et al. 2009).

Adding new features to describe classes, like having one class ’Amer-ican sci-fi’, a second ’Amer’Amer-ican movie with hero figure’ and a third ’American sci-fi movie with hero figure’ combining the classes, will quickly increase the number of classes leading to an overflow of data.

2.2.2 Feature-based Approach

Instead of grouping objects into classes, they can be described with features. For example, a movie object that belongs to the class ’Ameri-can 90s sci-fi movie with female hero figure’ ’Ameri-can be given the features American, 90s, sci-fi, female hero figure. A convenient way of repre-senting the objects is as a vector, where the values of the vector de-scribe the latent features the object posses.

Instead of doubling the total number of classes when describing objects with a new feature, for example ’award-winning’, a new value is added to the vectors instead. This will reduce the size of the data considerably. Determining whether there is a link between to objects can know depend on the distance, inner product or weighted combi-nation of the objects’ vectors (Miller et al. 2009).

Like determining classes, determining the number of features will be very important. Section 2.3 will present a model consisting of a binary matrix where the rows are objects and the columns are features.

2.3 Model

2.3.1 Bayesian Nonparametric Model

In section 2.3.2 a Bayesian nonparametric model with latent features is presented. A Bayesian nonparametric model is based on the as-sumption that data cannot be described with a finite set of parameters, thus as the data grows it is possible for the interpreted information or features to grow as well, making it a flexible model (Ghahramani 2009a). The model can be combined with the feature-based approach presented in section 2.2 to create a feature-based model (Miller et al. 2009).

(13)

2.3.2 Nonparametric Latent Feature Relational Model

In this model, each object is described by binary features. These fea-tures are not known beforehand and the goal is therefore to conduct inference in order to find them (Miller et al. 2009).

In order to make link predictions based on the features, the model needs three matrices which can be described as follows. Let N be the number of objects and K be the number of features, then:

• Y is the N ◊ N binary matrix where yij = 1 if there are a link

be-tween object i and j, yij = 0 if a link is not observed. Unobserved

links are unfilled.

• Z is the N ◊ K feature matrix, where znk = 1 if object n has

feature k.

• W is the K ◊ K matrix, where the weight wkkÕ affects the

prob-ability of there being a link between object i and object j if i has feature k and j feature kÕ.

The model will use IBP, explained in section 2.4, to generate a prior distribution on Z, and use Bayes theorem, Y Ã Z ◊ W, to obtain the posterior distribution Y .

Z _{≥ IBP (–)} wkkÕ ≥ N(0, ‡_w2)

yij ≥ ‡(ZiW Zj€)

(2.1) The last distribution is important for the understanding of the model. It reveals that only the said i and j objects and their features affect the probability of there being a link between them. The ‡ is a function that transforms the inner product to a number between 0 and 1. For simplicity it is possible to rewrite the equation:

‡(ZiW Zj€) = ‡(

ÿ

kkÕ

zikzjkÕw_kkÕ) (2.2)

(14)

2.4 The Indian Buffet Process

The idea behind the Indian Buffet Process (IBP), and the origin of its name, is that ’many Indian restaurants offer lunchtime buffets with an apparently infinite number of dishes’ (Griffiths & Ghahramani 2011). Since the customers does not know how many dishes that exists and what kind of dishes there are, they base their choices on how popular a specific dish is. The concept of IBP can be described as follows:

• Let – be the average number of dishes a customer serves herself. • Let mk be the number of customers, not including the current

customer, that have served themselves the dish k.

The first customer cannot base her choices upon other people’s choices, so she will take a serving of Poisson(–) dishes. The following cus-tomers will base their choices on the popularity of the dishes so the nth person will serve herself of dish k with the probability mk/nand trying

Poisson(–

n) number of new dishes. This scenario will result in a binary

feature matrix Z where the rows are customers and the columns are dishes where zij is equal to 1 if customer i has served herself of dish j

(Ghahramani 2009b).

Figure 2.1: Sample from IBP with N = 25 and – = 15. The figure shows that the amount of new features taken decreases as n increases, this is due to the quota in the probability Possion(–

n) gets smaller for

(15)

2.5 Inference with Gibbs Sampling

2.5.1 Bayesian Inference

Inference is a method that is performed in order to make link predic-tions based on the links that are already known or the links that are assumed to exist. With Bayesian inference the probability distribution of the data is deducted using Bayes’ theorem.

Let D denote the observed data and ◊ denote model parameters and missing data. Formal inference requires setting up a joint prob-ability distribution P (D, ◊) over all random quantities. The joint dis-tribution consists of two parts, the prior disdis-tribution P (◊) and a likeli-hood P (D|◊) and the full probability model is given by

P(D ﬂ ◊) = P(◊)P(D|◊) (2.3)

After having observed the data D, Bayes theorem is used to determine the distribution of ◊ conditional on D according to:

P(◊|D) = s P(◊)P (D|◊)

P(◊)P (D|◊)d◊ (2.4)

(Speigelhalter 1996)

2.5.2 Gibbs Sampling

Gibbs Sampling is a Markov chain Monte Carlo technique used for in-ference with Bayesian models (Hardisty 2010). With Gibbs Sampling, variables are sampled from their distributions conditioned on the cur-rent value of all other variables (Griffiths & Ghahramani 2011).

Gibbs sampling is useful in situations where the feature matrix Z has at least two dimensions, i.e each column z = z1, ..., zkwhere k > 1

(Hardisty 2010). With IBP, this means that there is at least two cus-tomers at the buffet.

The idea behind Gibbs sampling is that the current state of zk

de-pends on the state of z1, ..., zk≠1. Assume a walk between different

states z and let P (z) be the likelihood of visiting the state z and P (zt+1_|zt)

the probability of visiting the point zt+1 based on the states of the

(16)

With the prerequisites presented above, the following Gibbs sam-pler can be constructed:

Y _ _ _ _ _ _ _ _ _ _ _ ] _ _ _ _ _ _ _ _ _ _ _ [ z₁j ≥ P (z1|z2j≠1, ..., zKj≠1, Y) z₂j _{≥ P (z}2|z1j, z3j≠1, ..., zKj≠1, Y) ... z_kj _{≥ P (z}k|zj1, ..., zkj≠1, z j≠1 k≠1, ..., z j≠1 K , Y) ... z_Kj _{≥ P (z}k|zj1, ..., zKj ≠1, Y) (2.5) (Niemi 2018)

The distributions in equation 2.5 are called full conditional distribu-tions, meaning that the distribution zk is conditioned on the current

value of every other parameter (Hardisty 2010).

2.5.3 Gibbs Sampling for IBP

The IBP is an exchangeable process. In terms of customers and dishes, this means that it does not matter in which order the customers enters the buffet, the probability of a dish being taken will still depend only on the number of customers who have tried that dish before. Since the dishes are taken conditioned on the current value of the dish, it is possible to construct a Gibbs sampler to do posterior inference on the feature matrix Z which is generated by the IBP (Miller et al. 2009).

For the latent feature model described in section 2.3, posterior in-ference must be done on both matrix Z and W . Resampling Z given

W is done by resampling each row in Z individually. By denoting Zi

being the current row, zik the feature k of object i and Z≠ik the matrix

Z without the current entry zik, then

P(zik = 1|Z≠ik, W, Y) Ã mk N P(Y |zik = 1, Z≠ik, W) P(zik = 0|Z≠ik, W, Y) Ã N ≠ mk N P(Y |zik = 0, Z≠ik, W) (2.6) This is done for each entry zik where mk >0, otherwise the column is

(17)

After updating the existing features, new features for Zi are

pro-posed using a Metropolis-Hastings distribution: • Let knew≥ P oisson(_N–).

• Let koldbe the number of features only possessed by object i.

• Let Znewbe the Z matrix with the knewfeatures added to the

ob-ject i.

• If knew > kold, create a proposal weight matrix Wnew with knew≠

koldnew features.

Accept the proposal with the probability:

min A 1,P(Zi|Znew, Wnew) P(Zi|(Z, W ) B (2.7) (Williamson 2016a)

If the proposal is accepted, W is resampled. All weights that corre-sponds to removed features are dropped, new features are sampled from a normal distribution and already existing values are sampled from a normal distribution centred around the old value (Miller et al. 2009).

2.6 Related Work

2.6.1 A Coupled Indian Buffet Process Model for

Col-laborative Filtering

In the paper (Chatzis 2012), a collaborative filtering method for rec-ommending items to users using Indian Buffet Process priors is pre-sented. The model is based on the concept that a user rates an item based on its own interests and the genres that can describe the item. Each user can have several interests and each item can belong to mul-tiple genres. Experiments for the model are then conducted with a large dataset containing real ratings data.

The model is constructed using principles from Bayesian nonpara-metrics, where a set of latent features represents user interests and item genres. To assign these features, two coupled Indian Buffet Pro-cess priors are imposed over the variables assigning them to interests. The model assumes that observed item ratings are influenced by a

(18)

weighted combination of the user interests and the item genres, where the weights represents the probability of there being a connection be-tween the user and the item if they possess respective feature. This allows for inference in order to attain the number of existing latent features.

The variational Bayesian inference for the model is formulated on the basis of a truncated stick presented in the paper (Doshi et al. 2009).

2.6.2 Large Scale Nonparametric Bayesian Inference:

Data Parallelisation in the Indian Buffet Process

The paper (Doshi-velez et al. 2009) discusses the problem of scaling Bayesian inference to large datasets, specifically nonparametric mod-els like the IBP. For example, the report addresses the problem where a ratings database may contain millions of users and thousands of movies but an individual user has only rated a few movies which re-sults in a large but sparse dataset.

To speed up the inference on those kinds of datasets the paper in-troduces a way of for performing parallel inference for IBP-based mod-els. The IBP introduces specific challenges to parallelisation since the number of latent features is unbounded and changes during inference. The way of doing this as the paper describes is with a Markov Chain Monte Carlo sampler that divides a large dataset between different processors that computes the global likelihood and posters parallel while passing messages between each other.

The parallel inference is performed by alternating between three steps:

• Message passing: processors communicate to compute P (W |Y, Z). • Gibbs sampling: processors sample a new set of Zp in parallel

where Zp is the latent features corresponding to the partition of

observations Y assigned to processor p.

• Hyperparameter sampling: a root processor resamples global hyperparameters.

(19)

2.6.3 Parallel Markov Chain Monte Carlo for the Indian

Buffet Process

As in the paper presented in section 2.6.2, the paper (Zhang et al. 2017) presents an algorithm for performing parallel inference for Indian Buf-fet Process models. The authors discuss the need for a parallel infer-ence algorithm in ’big data’ situations where one single machine can-not hold all the data. Finally the paper addresses the importance of making algorithms that scale well in big data situations and that this should be a primary concern developing machine learning tools since the amount of data has grown to a massive size.

(20)

Chapter 3 Methods

One initial goal of this paper was to implement a Gibbs sampling for the presented model, but due to lack of time it was not possible to get the sampler accurate enough to rely on its result. Thus, experiments were conducted on an implementation of the IBP and the results were analyzed. Combining the experiments results and the literature study it was possible to discuss the possibilities of using IBP as part of a model for recommendation systems.

3.1 Literature Study

The subjects of this paper, the Indian buffet process and models in rec-ommendation systems, is advanced and it has therefore been impor-tant to conduct a thorough literature study to gain necessary knowl-edge. The background is in total based on several reports and lectures, however the presented model has its origin from the paper (Miller et al. 2009). To understand the model and strengthen its correctness several other reports has been used to confirm statements and ap-proach.

The paper (Miller et al. 2009) is written by three professors at the University of California, Berkeley. All three of them have conducted several and thorough research about the Indian Buffet Process. Thus, making their paper a reliable source with valid and correct informa-tion. The related work is all written by different professors at technical universities with expertise in computer science and machine learning.

(21)

14 CHAPTER 3. METHODS

3.2 Implementation of IBP

The following is a pseudo code for implementing IBP, where N is the number of customers and – is the average number of dishes that a customer takes. The version used for conducting experiments were implemented in Python.

IBP(N, –) =

K _{Ω P oisson(–)} Z Ω [N ◊ K]

Fill every row with zeros except the first one

m_{Ω K-size vector with number of taken portions from each}

dish for i Ω 1 to N do for k Ω 0 to K do prob[k] Ω mk i+1 end for k Ω 0 to K do rand[k] Ω rand(0, 1) end

Create new row: for k Ω 0 to K do

if prob[k] > rand[k] then row[k] Ω 1; else row[k] Ω 0;

end

Z[i] Ω row

newDishes _{Ω P oisson(} – i+1)

Add newDishes amount of columns to Z setting the values to 1 for the current row

K Ω K + newDishes end

(22)

CHAPTER 3. METHODS 15

3.3 Experiments

In order to see how IBP behaves as a prior when the number of objects

N and the average numbers of features – varies, experiments were

conducted. The aim of the experiments were to see how the number of clusters varied as the number of objects or – increased, a cluster here being defined as a unique set of features. This is interesting because the clusters are a result on how detailed the objects can be described.

To get reliable results, the number of clusters at each data point (N or –) were calculated 1000 times and the average of those results was chosen.

(23)

Chapter 4 Results

4.1 Varying Parameters

The following results shows that when N is incremented the number of clusters increases linearly. The results also shows that when – is incremented the number of clusters rapidly increases to become equal to the number of objects.

4.1.1 Varying N

Figure 4.1: The average number of clusters as the number of objects ranges from 1 to 100, – is set to 3.

(24)

CHAPTER 4. RESULTS 17

Figure 4.2: Sample from IBP with N equal to 10, 25, 50 and 100 respec-tively. – is set to 3. The Y-axis represents the objects and the X-axis the features, white indicates a feature being possessed by an object.

4.1.2 Varying –

Figure 4.3: The average number of clusters as – ranges from 1 to 50, N is set to 100.

(25)

18 CHAPTER 4. RESULTS

Figure 4.4: Sample from IBP with – equal to 5, 10, 25 and 75 respec-tively. N is set to 100. The Y-axis represents the objects and the X-axis the features, white indicates a feature being possessed by an object.

4.2 Related Experiments

The paper (Chatzis 2012) in section 2.6.1 presents a Coupled Indian Buffet Process model. Experiments were performed on a large sample of a ranking dataset consisting of 10 million ratings for 10,677 movies made by 69,878 users of the MovieLens website. The dataset consists of randomly selected users that have rated at least 10 movies each. The model was compared to four baselines whereof two ’extremely competitive modern methodologies’. Results show that the Coupled Indian Buffet Process model outperforms the other alternatives. The paper discusses the need of training with extremely sparse datasets which often is a case in real-life systems. Even in this case the algo-rithm outperforms the other four. The paper finally shows that it is possible to scale the presented model to use in real-life systems.

The conclusion of the paper (Doshi et al. 2009) described in sec-tion 2.6.2 is that, as datasets grow, parallelisasec-tion is an ’increasingly attractive and important feature for doing inference’. The algorithm presented allows for a compact representation of the feature

(26)

poste-CHAPTER 4. RESULTS 19

rior that approximately decorrelates the data sorted on each proces-sor, limiting the communication bandwidth between processors. The approximate sampler presented is compared against an exact paral-lel Metropolis sampling where the approximate sampler is about five times faster and achieves comparable or better predictive likelihoods than the exact sampler.

In the paper (Zhang et al. 2017) it is presented that directly imple-menting IBP with earlier inference algorithms in ’big data’ situations will undoubtedly lead to poor results or inefficient computing. How-ever the model and the parallel inference method presented in section 2.6.3 will avoid these problems and it is guaranteed that it will produce as exact results as a non-parallel inference method.

All the above results indicates that it is possible to use IBP as priors in a model for recommending products to users.

(27)

Chapter 5 Discussion

The literature study shows that the Nonparametric Latent Feature Re-lational Model presented in section 2.3.2 is a suitable model for link prediction between objects based on their features, when the different features and the number of features describing an object is not known beforehand. The IBP allows for an infinite amount of features, since it does not require any prior knowledge of them. Instead the different features is inferred through sampling. The study shows that Gibbs sampling is an appropriate method when inferring the priors of the model.

The basic concept of the IBP is largely that a customer at a buffet is more likely to pick a popular dish than a dish that only few customers have tried. The result of the experiment in section 4.1.1 aligns with the concept, since it indicates that when the number of objects increases, it is possible to describe an object from a larger set of features. This is because the probability of a feature being possessed by an object is more precise when the probability can be based on a larger dataset. When varying – in section 4.1.2 the results seems peculiar, since each user gets its own cluster. However, in the experiment, the difference between – and N is much smaller than a real-life situation, making the results non-representative of most real-life situations and therefore not interesting.

The literature study also shows that performing Gibbs sampling on an IBP prior is inefficient in a real-life situation where the dataset is large. However, the papers (Doshi et al. 2009) and (Zhang et al. 2017) presented in section 4.2 both presents a model for performing Gibbs sampling on multiple processors in parallel. This method proves to be efficient when working with large real-life datasets, solving the initial

(28)

CHAPTER 5. DISCUSSION 21

performance problem of Gibbs sampling.

The Nonparametric Latent Feature Relational Model is suitable for link prediction between objects of the same type, for example users. In the case of recommending products to users i.e. two different objects, it is desirable to base the recommendations upon a user’s features and a product’s features. The presented model cannot handle two different types of objects, but in the paper (Chatzis 2012) they present a Coupled IBP model which can link two different types of objects the desired way. The paper also shows that this coupled model outperforms other modern methodologies.

The Coupled Indian Buffet Process model makes it possible to use IBP priors as part of a model for a recommendation system. Com-bining the Coupled Indian Buffet Process model with parallel Gibbs sampling for inference could make the coupled model more efficient and suitable for large real-life datasets.

5.1 Future work

To further investigate the possibilities of using IBP in a real life recom-mendation system it is suggested to implement the coupled IBP model with parallel Gibbs sampling and conduct experiments on a real life dataset.

If the implementation performs well a comparison with other meth-ods would be in order, to determine if it would be beneficial to use in a real life recommendation system.

It is also possible to extend this IBP evaluation by comparing IBP with other probability distributions and to conduct more experiments, for example to measure its time efficiency.

(29)

Chapter 6 Conclusion

This paper presents the Indian Buffet Process and Gibbs sampling and examines if it is possible to use an IBP prior in a recommendation sys-tem. The conclusion is that it is possible to use IBP in a recommenda-tion system when recommending one object to another, for example movies to users. A model for doing that requires two IBP priors, one for each object. This model is called the coupled IBP model. One pre-sented problem is that it is inefficient to infer IBP priors with Gibbs sampling, this problem can be solved by implementing parallel Gibbs sampling. In order to use IBP priors in a recommendation system which include real-life datasets, the paper suggests the use of a cou-pled version of the IBP model and if possible perform inference with a parallel Gibbs sampling.

(30)

Bibliography

Cao, Y. & Li, Y. (2007), ‘An intelligent fuzzy-based recommendation system for consumer electronic products’, Expert Systems with

Applications 33(1), 230 – 240.

URL: http://www.sciencedirect.com/science/article/pii/S0957417406001369 Chatzis, S. P. (2012), ‘A coupled indian buffet process model for

collab-orative filtering’.

Doshi, F., Miller, K., Gael, J. V. & Teh, Y. W. (2009), Variational infer-ence for the indian buffet process, in D. van Dyk & M. Welling, eds, ‘Proceedings of the Twelth International Conference on Artificial In-telligence and Statistics’, Vol. 5 of Proceedings of Machine Learning

Re-search, PMLR, Hilton Clearwater Beach Resort, Clearwater Beach,

Florida USA, pp. 137–144.

URL: http://proceedings.mlr.press/v5/doshi09a.html

Doshi-velez, F., Mohamed, S., Ghahramani, Z. & Knowles, D. A. (2009), Large scale nonparametric bayesian inference: Data paral-lelisation in the indian buffet process, in Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams & A. Culotta, eds, ‘Advances in Neural Information Processing Systems 22’, Curran Associates, Inc., pp. 1294–1302.

URL: http://papers.nips.cc/paper/3669-large-scale-nonparametric-bayesian-inference-data-parallelisation-in-the-indian-buffet-process.pdf

Ghahramani, Z. (2009a), ‘A brief overview of nonparametric bayesian models’. Downloaded: 2018-04-25.

URL: http://mlg.eng.cam.ac.uk/zoubin/talks/turin09.pdf

Ghahramani, Z. (2009b), ‘The indian buffet process and extensions’. Downloaded: 2018-02-13.

(31)

2 BIBLIOGRAPHY

Ghahramani, Z. (2017), ‘Introduction to link prediction’. Downloaded: 2018-03-26.

Griffiths, T. L. & Ghahramani, Z. (2011), ‘The indian buffet process: An introduction and review’, J. Mach. Learn. Res. 12, 1185–1224.

URL: http://dl.acm.org/citation.cfm?id=1953048.2021039

Hardisty, P. R. . E. (2010), ‘Gibbs sampling for the unitiated’. University of Maryland.

Konstan, J. A. & Riedl, J. (2012), ‘Recommender systems: from algo-rithms to user experience’, User Modeling and User-Adapted

Interac-tion 22(1), 101–123.

URL: https://doi.org/10.1007/s11257-011-9112-x

Miller, K., Jordan, M. I. & Griffiths, T. L. (2009), Nonparametric latent feature models for link prediction, in Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams & A. Culotta, eds, ‘Advances in Neural In-formation Processing Systems 22’, Curran Associates, Inc., pp. 1276– 1284.

Niemi, D. J. (2018), ‘Gibbs sampling’. Downloaded: 2018-04-11. URL: http://www.jarad.me/courses/stat544/slides/Ch11/Ch11b.pdf Pham, M. C., Cao, Y., Klamma, R. & Jarke, M. (2011), ‘A clustering

ap-proach for collaborative filtering recommendation using social net-work analysis.’, J. UCS 17(4), 583–604.

Ricci, F., Rokach, L. & Shapira, B. (2011), Introduction to Recommender

Systems Handbook, Springer US, Boston, MA, pp. 1–35.

URL: https://doi.org/10.1007/978-0-387-85820-31

Speigelhalter, W. G. . S. . D. (1996), Markov Chain Monte Carlo in Practice, Chapman Hall.

Williamson, S. (2016a), ‘An introduction to bayesian nonparametrics’. Downloaded: 2018-04-18.

URL: http://www.ucsp.edu.pe/ciet/mlss16/file/BNP3.pdf

Williamson, S. A. (2016b), ‘Nonparametric network models for link prediction’.

Zhang, M. M., Dubey, A. & Williamson, S. A. (2017), ‘Parallel Markov Chain Monte Carlo for the Indian Buffet Process’, ArXiv e-prints .

(32)

An Evaluation of the Indian Buffet Process as Part of a Recommendation System

An Evaluation of the Indian Buffet

Process as Part of a

Recommendation System

HELENA ALINDER

JOSEFIN NILSSON

An Evaluation of the Indian

Buffet Process as Part of a

Recommendation System

HELENA ALINDER

JOSEFIN NILSSON

Abstract

Sammanfattning

Contents

Chapter 1

Introduction

1.1 Purpose

1.2 Problem Statement

1.3 Outline

Chapter 2

Background

2.1 Recommendation System

2.1.1 Memory Based Technique

2.1.2 Model Based Technique

2.2 Link Prediction

2.2.1 Class-based Approach

2.2.2 Feature-based Approach

2.3 Model

2.3.1 Bayesian Nonparametric Model

2.3.2 Nonparametric Latent Feature Relational Model

2.4 The Indian Buffet Process

2.5 Inference with Gibbs Sampling

2.5.1 Bayesian Inference

2.5.2 Gibbs Sampling

2.5.3 Gibbs Sampling for IBP

2.6 Related Work

2.6.1 A Coupled Indian Buffet Process Model for

Col-laborative Filtering

2.6.2 Large Scale Nonparametric Bayesian Inference:

Data Parallelisation in the Indian Buffet Process

2.6.3 Parallel Markov Chain Monte Carlo for the Indian

Buffet Process

Chapter 3

Methods

3.1 Literature Study

3.2 Implementation of IBP

3.3 Experiments

Chapter 4

Results

4.1 Varying Parameters

4.1.1 Varying N

4.1.2 Varying –

4.2 Related Experiments

Chapter 5

Discussion

5.1 Future work

Chapter 6

Conclusion

Bibliography