Recommender System for Gym Customers

(1)

Linköpings universitet

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Statistics and Machine Learning

2020 | LIU-IDA/STAT-A--20/024–SE

Recommender System for Gym Customers

Roshni Sundaramurthy

Supervisor : Anders Nordgaard : Examiner : Annika Tillander

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Recommender systems provide new opportunities for retrieving personalized information on the Internet. Due to the availability of big data, the fitness industries are now focusing on building an efficient recommender system for their end-users. This thesis investigates the possibilities of building an efficient recommender system for gym users. BRP Systems AB has provided the gym data for evaluation and it consists of approximately 896,000 customer interactions with 8 features. Four different matrix factorization methods, Latent semantic analysis using Singular value decomposition, Alternating least square, Bayesian personalized ranking, and Logistic matrix factorization that are based on implicit feedback are applied for the given data. These methods decompose the implicit data matrix of user-gym group activity interactions into the product of two lower-dimensional matrices. They are used to calculate the similarities between the user and activity interactions and based on the score, the top-k recommendations are provided. These methods are evaluated by the ranking metrics such as Precision@k, Mean average precision (MAP) @k, Area under the curve (AUC) score, and Normalized discounted cumulative gain (NDCG) @k. The qualitative analysis is also performed to evaluate the results of the recommendations. For this specific dataset, it is found that the optimal method is the Alternating least square method which achieved around 90% AUC for the overall system and managed to give personalized recommendations to the users.

Keywords: Recommender system, collaborative filtering, matrix factorization, sparse matrix, latent semantic analysis, singular value decomposition, alternating least square, Bayesian personalized ranking, logistic matrix factorization, stochastic gradient descent, AUC metric, mean average precision, normalized discounted cumulative gain

(4)

Acknowledgements

First, I would like to pay my special regards to my academic supervisor, Anders Nordgaard for always providing valuable feedback and suggestions throughout the thesis project which helped me to complete everything on time and each discussion with him is a learning expe-rience.

I would express my appreciation to Sara Johansson who helped me to crack this opportunity and especially Johan Falkenjack and Gustav Hagström for believing in me, giving me this op-portunity in the first place and allowing me to explore this exciting problem at BRP Systems AB. Additionally, I would like to express my deepest gratitude for my company supervisors who contributed to the thesis by their generous time, deep knowledge, and sound advice. Also, I wish to extend my thanks to everyone who helped me throughout the data collection and evaluation phase at BRP Systems.

Then, I would like to thank my opponent Brian Kiprono Masinde who reviewed my thesis and provided valuable feedbacks.

Last but not least, I want to thank my husband Arun Kumar, and my family for their patience and ultimate support, which is extremely important for me.

(5)

1 Introduction 1 1.1 Commisioner . . . 1 1.2 Motivation . . . 1 1.3 Aim . . . 2 1.4 Ethical Consideration . . . 3 2 Data 4 2.1 Data source . . . 4 2.2 Data description . . . 4 2.3 Data exploration . . . 5 2.4 Data preprocessing . . . 7 2.4.1 Feature selection . . . 7 2.4.2 Data format . . . 8 3 Related Research 10 3.1 Recommender systems in fitness industries . . . 10

3.2 Approaches in Recommender Systems . . . 10

3.2.1 Collaborative filtering RS . . . 10

3.2.1.1 Memory based collaborative filtering RS . . . 11

3.2.1.2 Model based collaborative filtering RS . . . 11

3.2.2 Content-based filtering RS . . . 12 3.2.3 Hybrid RS . . . 12 3.3 Hyperparameter Optimization . . . 13 3.3.1 Grid search . . . 13 3.3.2 Random search . . . 13 3.4 Evaluation metrics . . . 13 4 Method 14 4.1 Approaches used . . . 14 4.1.1 Matrix Factorization . . . 14

4.1.2 Latent Semantic Analysis . . . 14

4.1.3 Alternating Least Squares . . . 16

(6)

4.1.5 Logistic Matrix Factorization . . . 20

4.2 Evaluation . . . 21

4.2.1 Quantitative analysis . . . 21

4.2.1.1 Precision . . . 21

4.2.1.2 Mean average precision . . . 21

4.2.1.3 Normalized discounted cumulative gain . . . 22

4.2.1.4 Area under the curve (AUC) . . . 22

4.2.2 Qualitative analysis . . . 22

4.3 Experimental setup . . . 23

5 Results 24 5.1 Alternating least square method . . . 24

5.2 Bayesian personalized ranking method . . . 26

5.3 Logistic matrix factorization method . . . 27

5.4 Latent Semantic Analysis (using SVD) . . . 27

5.5 Comparison of results . . . 28

5.6 Recommendation of activities . . . 29

5.6.1 Similar activities recommendation to a specific activity . . . 30

5.6.2 Recommendations based on a users previous visiting history . . . 31

5.7 Recommendation speed . . . 33

6 Discussion 34 6.1 Method and Results . . . 34

6.2 Limitations . . . 35

6.2.1 Implicit feedback . . . 35

6.2.2 Cold start problem . . . 36

6.3 Future research . . . 36

7 Conclusion 37

Bibliography 38

(7)

List of Figures

2.1 Popularity of the available gym activities . . . 5

2.2 Gender distribution based on count of visits . . . 6

2.3 User distribution by age groups . . . 7

2.4 Data distribution . . . 8

2.5 Example Compressed Sparse Row matrix representation . . . 9

3.1 Approaches in recommender system1 . . . 11

4.1 Matrix factorization [bravo2009clustering] . . . . 15

5.1 Performance of ALS for different latent factors . . . 24

5.2 ROC curve for a user (ALS) . . . 25

5.3 Performance of BPR for different latent factors . . . 25

5.4 ROC curve for a user (BPR) . . . 26

5.5 Performance of LMF for different latent factors . . . 26

5.6 ROC curve for a user (LMF) . . . 27

5.7 ROC curve for a user (LSA) . . . 27

5.8 Comparison based on Precision@10 and MAP@10 . . . 28

5.9 Comparison based on NDCG@10 and AUC@10 . . . 29

5.10 Runtime of different algorithms . . . 33

(8)

List of Tables

2.1 Information about the products (available in gym) . . . 4

2.2 Information about the users who booked and attended the gym activities . . . 5

5.1 Final hyperparameters chosen for all the methods . . . 28

5.2 Mean AUC score for all the methods . . . 29

5.3 Top most recommendations for activities similar to GRIT Strength activity using ALS and BPR . . . 30

5.4 Top most recommendations for activities similar to GRIT Strength activity using LMF and LSA . . . 30

5.5 Activities previously visited by an active user . . . 31

5.6 Top most recommendations for an active user (based on this user’s previous visit-ing history) usvisit-ing ALS and BPR . . . 32

5.7 Top most recommendations for an active user (based on this user’s previous visit-ing history) usvisit-ing LMF and LSA . . . 32

A.1 Example data for the gym users . . . 42

A.2 Example data for the gym products . . . 42

A.3 Hyperparameter values tested during the grid search for ALS, BPR, and LMF methods . . . 42

A.4 Hyperparameter values tested for LSA method . . . 42

A.5 Top most recommendations for activities similar to BODYJAM 60 activity using ALS and BPR . . . 43

A.6 Top most recommendations for activities similar to GRIT Strength activity using LMF and LSA . . . 43

(9)

1 Introduction

1.1 Commisioner

The commissioner of this thesis work is BRP Systems AB1which offers a booking and re-source planning system with all the necessary functionality used by large and small facilities, municipalities and chains in the leisure industry. It provides technical market-leading ERP (Enterprise resource planning) software and services for membership handling, booking, and payment solutions to fitness facility operators (Nordics and Europe).

1.2 Motivation

Recommender systems generally recommend items such as music, news, movies, jobs, prod-ucts, and many other things to the users. These systems, powered by Artificial Intelligence (AI), are widely used in commercial applications, particularly in e-commerce, social media, and content-based services and enable users to receive personalized suggestions based on the information the system has from the activity of users and to take corrective actions during their online transactions. They help to retain the customers and improve their shopping experience.

These systems are used to boost sales, increase revenue, and improve customer experience and they can be perfectly suited to business needs all around the world. Especially using machine learning, they are built strategically with significant features considering customer’s interest.

Nowadays, major participants of e-commerce web-like Amazon2(the world’s largest online retailer and a prominent cloud services provider), Netflix3 (an online streaming service which offers popular movies, TV shows, and documentaries users can watch over the inter-net), etc., are focusing much on building an efficient recommendation system to satisfy their customers and catalyze their business development. It is interesting to know that Amazon credits 35 percent of its revenue to its recommendation system. They show the user items

1_{https://brpsystems.se/} 2_{https://www.amazon.com/} 3_{https://www.netflix.com/}

(10)

1.3. Aim

which are purchased along with the items that the user has in their cart, products similar to the ones they already viewed, etc. Also, Netflix has conducted an open (Netflix Prize (NP)) competition for the best collaborative filtering based model to predict ratings for the movies and provide recommendations [34]. This lead to a vast increase in its business revenue. BRP Systems AB is now majorly focusing on improving the gym customer retention and satisfaction. By combining the users’ database (streaming data) and artificial intelligence, the company desire to explore the possibilities of driving through machine learning processes and develop a recommender system for gym customers to increase the gym group activities bookings and user attendance.

The likelihood that the user will book a particular group activity in the available gym activi-ties is estimated based on several factors including:

• user interactions with the service such as their booking history, users unique ID, etc, • other members with similar interests and preferences on the service, and

• information about group activities, such as their unique ID, description, categories, etc. These pieces of information are fed to the sophisticated machine learning algorithm as inputs. The system figures out what should be weighed and looks for similarities between users to group them. These suggestions are based on each customer’s browsing and purchase behav-ior.

Various approaches have been used to build such systems. There exists different type of recommender systems such as content-based, collaborative filtering, hybrid recommender system, demographic and keyword-based recommender system [7]. The content-based ap-proach is based on product information (such as group activities title, text description) and user information such as their demographic details. The content-based approach needs to collect more external information, hence it is complex. On the other hand, we have a collab-orative filtering approach where it requires only the user’s previous purchase history. This information can be both explicit (ratings) or implicit (such as the number of visits for each group activity in the gym) [32]. This is further described in Chapter 3.

1.3 Aim

The underlying purpose of this thesis project is to explore the possibilities of creating an efficient recommendation system for gym customers using the available big data. The high-est efficiency gain would be achieved if the algorithm could automatically predict users’ interests and recommend gym activities that quite likely are interesting for them. Signifi-cantly suggesting new, useful, and more closely related gym group activities (personalized recommendations) for users based on their previous purchase history and group activity features. The objective is therefore to answer the following research questions:

1. Which recommender system methods will be more suitable for this fitness-based prob-lem?

2. Which implicit feedback based method/methods is better suited/more appropriate for producing personalized recommendations for gym activities?

(11)

1.4. Ethical Consideration

1.4 Ethical Consideration

BRP Systems is, in general, more careful when they access their client’s data to improve their services to their customers. The fitness company that provided the data gave their approval to use their data in this thesis as long as no individual user is identified. The personal infor-mation about the users such as their names, birth dates, the personal number is not explicitly used for this thesis. Only a few demographic features like birthdate (YYYY-MM-DD) are used to calculate the age whereas the actual personal number or name is not utilized. This is to ensure that no individual can be identified.

(12)

2 Data

2.1 Data source

The data for this thesis is provided by BRP Systems AB and it is obtained from an anony-mous fitness company which is one of the clients for BRP Systems. The permission has been granted to access a copy of this company’s database for the project. My supervisor in the company suggested some features that could be interesting to include in the model and the final features are selected based on the requirements for different models.

2.2 Data description

The data used in this thesis is obtained from a database containing over 350 tables for the fitness company. This database consists of all historical members and every gym classes they booked and attended, every visit and every purchase (gym membership, personal trainer, etc).

The required information about the users and gym activities are extracted from the tables using the unique keys such as person_id and product_id. Some of the features in the user (Table 2.2) and activities (Table 2.1) table are displayed.

Table 2.1: Information about the products (available in gym) Feature Description

product_id Unique ID for a product

name Name of a product

producttype_id Unique number for several products such as food and drinks, group activities, subscriptions, presentcards, etc

deleted Indicates if a product is active/inactive numberInStock Indicates the count of available products deletedTime Time indicating the removal of product description Text describing the product features

(13)

2.3. Data exploration

Table 2.2: Information about the users who booked and attended the gym activities Feature Description

person_id Unique ID for a user

birthDate Used to find the ’age’ of a user sex Describes gender of a user

staffMember Indicates if a user is is a staff member of the gym arrived Date and time of arrival to the group activity orderitem Unique ID indicating the ordered product groupactivity ID used to connect orderitem table

deletedUser Indicates if a user has active/inactive subscription

The final queried dataset consists of approximately 896,853 user interactions during the pe-riod between 2007-12-18 and 2020-02-14 with 8 features. The company’s current focus is more on gym group activities than any other product in the given data. Hence, in this project, only gym group activities are chosen as products for recommendations.

2.3 Data exploration

To have a glimpse of popular gym activities in the given data, a word cloud (see figure 2.1) is created. The activity BODYPUMP® 604seems to be more popular with 9313 visits. This activity is the ideal workout for anyone looking to get fit – fast and it is highly intense. This indicates the importance of recommending similar activities to BODYPUMP® 60. Hence, users may be interested to try these activities than visiting the same activity forever. Also, providing similar recommendations for other activities such as Cycle 60, BODYCOMBAT® 605 and Underground 456 (tough strength and fitness training with the help of tools) are important since they have more number of visits next to BODYPUMP® 60.

Figure 2.1: Popularity of the available gym activities 4_{https://www.lesmills.com/workouts/fitness-classes/bodypump/}

5_{https://www.lesmills.com/workouts/fitness-classes/bodycombat/} 6_{https://www.campushallen.se/traning/grupptraning/underground}

(14)

2.3. Data exploration

Globally, females are less likely than males to get enough exercise7. This is due to various reasons such as weather, lack of time, lack of confidence, cost of gym memberships, lycra-phobia (feel of not looking good in gym wear), etc8. Hence, it is interesting to know the user demographic information in the given data. The gender distribution of the gym users is plotted (see figure 2.2) and it is noted that surprisingly the female users are more active towards group workout when compared to the male users.

Figure 2.2: Gender distribution based on count of visits

Figure 2.3 shows the distribution of gym users by age groups. The average age of a gym user is 32 years old. The users (young adults) with age group 18-35 are more active towards the group activities than other users. It is quite obvious that young adults are more concerned about their fit and the activities help them control weight, build lean muscle, and reduce fat. In the data, these users have been participating in activities such as BODYPUMP® (high intense), Cycle 60, and also meditation-related group activities. The next most active age group is 36-55-year-old users (middle-aged adults). This user group needs to maintain cardiovascular fitness9 and they have been participating in activities almost similar to the young adults. The users with age group 56+ are approaching their retirement age. These users show very little interest in group activities and they have been participating in activ-ities such as Gymintro för Seniorer, Stark Senior 45 (activactiv-ities for seniors), and yoga-based activities.

7

https://www.everydayhealth.com/fitness/new-study-reports-young-women-minorities-less-likely-exercise-than-male-peers/

8_{https://www.better.org.uk/content_pages/top-gym-excuses}

(15)

2.4. Data preprocessing

Figure 2.3: User distribution by age groups

2.4 Data preprocessing

The feature producttype_id in the product table has a unique id for several products. Since we consider only the group activities to recommend, only data with a producttype_id that indicates that the product is a group activity is used. The duplicated rows that have identical values on arrived feature for each user are removed before analysis.

There exists some group activities which are not currently being used (inactive) by the gym in certain years. Hence, these inactive group activities are removed, resulting in a reduced dataset consisting of 616959 user interactions. Recommender systems have an issue called user cold-start indicating that it is hard to provide personalized recommendations for users with a very less number of interactions with activities, due to the unavailability of informa-tion to model their preferences. Furthermore, including the entire data is computainforma-tionally expensive. Hence, the active user with more interactions is considered for analysis i.e the users who booked and attended the group activities for more than 5 times are considered active [10]. Mavridis, in his paper [10] considered users who have rated more than 5 items in a shopping search engine as they are assumed to show preferences for the items.

2.4.1 Feature selection

Most collaborative filtering based recommender systems are built with user preferences. They are called as explicit and implicit feedback data. Explicit data is the direct preference data provided by the users such as ratings and likes whereas implicit data is the non-direct preference data such as the number of views of a customer, number of times a customer listened to a song or the number of times a customer purchased a particular type of product. In our data, there are no such preferences explicitly provided by the user.

(16)

2.4. Data preprocessing

Figure 2.4: Data distribution

Hence, the new feature named ’visits’ is created for analysis. This indicates the number of times each user booked and attended each activity based on the booking history and is con-sidered as implicit data for our model. It is created by aggregating the user and the associated activity and applying a count function. The histogram (see figure 2.4) shows how the num-ber of visits is distributed. The x-axis represents the occurrences of the individual numnum-ber of visits per user and the y-axis shows the number of users associated with those occurrences. The histogram shows a very asymmetrical frequency distribution of data points. Most users in the given data tend to visit the gym group activities for not more than 20 times. It is obvious that the count of user visits do not follow a normal distribution. It is more peaked and has thinner tails than a normal distribution. Moreover, it is highly skewed towards the right (positively skewed) with skewness = 2.61. Therefore, log transformation is applied to the feature ’visits’ to smooth the distribution. For matrix factorization methods, only users and activities id with the number of visits to the group activities are required. There exist no missing values in these features.

2.4.2 Data format

The data with chosen features cannot directly be used in the methods applied in this project since these methods require data to be represented in the form of a sparse matrix. Therefore, the data is transformed into a matrix with each unique user in the rows and each unique gym activity in the columns. The elements of this matrix will be the total number of visits for each activity by each user. Activities with a larger number of visits by a user carry more weight in our matrix. Most of the elements in this matrix will be zero. This is because only a few users have attended every group activity available in the gym. The sparse matrix has the following advantages:

• Using sparse matrices to store the data saves a significant amount of memory and speeds up the processing of that data since only values of activities that are non-zero are saved [1].

• They also have significant advantages in terms of computational efficiency. Like other matrices, low-level arithmetic operations such as zero-adds (x+0 is always x) will not be done by these sparse matrices. The resulting efficiencies can lead to enormous

(17)

improve-2.4. Data preprocessing

ments in execution time for programs working with an immense amount of sparse data by traversing only the non zero elements [1].

The sparse matrix is represented using the Compressed Sparse Row (CSR) (see figure 2.5) format by calling the csr_matrix() function of SciPy [38] library.

Figure 2.5: Example Compressed Sparse Row matrix representation

This CSR representation is required to squeeze out the zeroes in the matrix and consider only the non zero values. Hence, the row indices are compressed and read before the column indices. The CSR format stores the sparse users ˚ activities matrix in row form using three (1-dimensional) arrays that respectively contain:

• Values • Row pointers • Column indices

The vector ’Values’ indicates the nonzero values in the sparse matrix. The ’Column indices’ indicates the column of the respective nonzero values. ’Row pointers’ indicate the indices (row-wise counting of entries in the matrix) of the nonzero values which actually begins the row in the sparse matrix. In the example shown in figure 2.5, 13, 2, and 3 are the beginning nonzero values of each row, and the indices of these values are row pointers. So, the index of value 13 is 0, for value 2 the index is 2, and the index for value 3 is 5.

The sparsity of the matrix is calculated using this formula given below,

Sparsity=1 ´number o f interacted activities(non ´ zero elements) matrix size(total number o f elements)

(18)

3 Related Research

3.1 Recommender systems in fitness industries

To the best of the author’s knowledge, there exist very few studies made about recommender systems (RS) in the fitness industries. Sánchez Bocanegra et al. [8] built HealthRecSys, a content-based RS that associates health consumers to well-known health educational web-sites from MedlinePlus for a given health video from YouTube10 (a video sharing service where users can watch and upload their own videos). The text is extracted from metadata in the relevant YouTube videos to develop the educational website links. Ercan Ezin, Kim, and I. Palomares have presented ’Fitness that Fits’, a platform for workout video recommendations predicated on the users’ preferences and their recent viewing behavior [4]. In their work, a hybrid approach is presented by analyzing both the video features and user-user similarity measures.

But there have been a lot of popular studies related to the concepts of RS that are widely used in many different fields and there exist several approaches to build RS based on various user and item features. Most of the recommender systems make use of collaborative and/or content-based filtering methods. For the former method, implicit or explicit feedback given by the users are considered. These include feedbacks like ratings, likes, number of times a user visited, number of times a user purchased the product, etc. But for the latter, the product features such as title, description, etc are mostly considered. RS provides users with recom-mendations of items using varied information and also balances the factors like accuracy, novelty, dispersity, and stability within the top recommendations.

3.2 Approaches in Recommender Systems

3.2.1 Collaborative filtering RS

Collaborative Filtering (CF) methods play an important role in the recommendation, al-though they are often used along with other filtering techniques like content-based, knowledge-based or social ones. Johnson, in his research work [18], has stated that "Since most data on the web comes in the form of implicit feedback data, there is an increasing

(19)

3.2. Approaches in Recommender Systems

demand for collaborative filtering methods that are designed for the implicit case".

Figure 3.1: Approaches in recommender system1

3.2.1.1 Memory based collaborative filtering RS

In memory-based collaborative filtering RS, the correlation among different users is discov-ered. And similarly, the products of potential interest (products rated by other users but not viewed by the current user) are discovered and the rating is predicted for the current user. Both the advantages and disadvantages of alternative approaches for making recommenda-tions are discussed in this paper [37]. The most common research studies are concentrated on a movie, product, and document RS.

However, memory-based collaborative filtering techniques have limitations, such as the simi-larity values are based on common items and therefore are unreliable when data is sparse and the common items are therefore few. Also, these techniques are slow for high sparsity data. Hence, better prediction performance can be achieved using model-based CF approaches. Model-based CF techniques use the pure explicit/implicit rating data to estimate or learn a model to make predictions [30].

3.2.1.2 Model based collaborative filtering RS

Model-based recommendation systems involve constructing a model in a low-rank dimen-sional space based on the explicit/implicit ratings provided by the users without ever having

1

(20)

3.2. Approaches in Recommender Systems

to use the full dataset. The time needed to query the model (as opposed to the whole dataset) is much smaller than querying the full dataset. Hence, this method offers both speed and scalability benefits (for large datasets).

There are various approaches available to build this model although the basic idea behind this RS is the same. A few common approaches are latent model, matrix factorization, and clustering [26]. Most of the best realizations of latent factor models are based on matrix factorization [29]. The matrix factorization method characterizes both items and users by identifying the hidden (latent) feature vectors of factors inferred from item rating patterns. By combining good scalability, these methods became popular in recent years along with predictive accuracy. Additionally, they offer much flexibility for modeling various real-life situations. They rely on a matrix with one dimension representing users and the other di-mension representing items of interest as input data. For example, Netflix collects star ratings for movies (explicit user feedback). When explicit feedback is not available, recommender systems can infer user preferences using implicit feedback such as the number of views, purchases, etc [29]. The implicit feedback represents the presence or absence of an event and hence it will be sparse since no user will purchase/view all the available products.

Several methods for matrix factorization are available. One such method is latent semantic analysis (LSA) which applies singular value decomposition (SVD) that uses singular values of the initial matrix to factorize it [13]. Another popular approach for implicit feedback data is the alternating least square (ALS) method provided by Yifan Hu. In his research work, the unique properties of implicit feedback datasets are discerned. The data is treated as positive and negative preference associated with varying confidence levels. Hence, the recommendations for television shows are provided here by using factor models [32]. Rendle et al. [21] have presented a generic optimization criterion, BPR-Opt (the maximum posterior estimator derived from a Bayesian analysis of the problem) for personalized rank-ing. The learning algorithm is based on stochastic gradient descent with bootstrap sampling where the movie recommendations (Netflix) are provided for users. A probabilistic latent factor model called Logistic matrix factorization is presented by Johnson [18] for song recom-mendations.

3.2.2 Content-based filtering RS

This method is also referred to as cognitive filtering that suggests items based on a com-parison between the item content and a user profile. Item’s content is defined as a series of descriptors or terms, usually the words which appear in a text. The user profile is defined in the same context and is created by examining the meaning of objects encountered by the user. Ozsoy applied Word2Vec techniques (neural language algorithm) to the Foursquare check-in dataset to list the top-k venues/locations (e.g. restaurant, cafe) that the target user will visit/check-in in the future [12]. In demographic-based RS, the demographic information such as age, gender, employment status is used to find the users who like a specific item [37].

3.2.3 Hybrid RS

Hybrid RS evolves when two or more of the above different techniques such as collabora-tive and content-based RS are combined so that the resulting RS can improve performance. This method is usually adopted to deal with the cold-start problem. Also, two different content-based RS could work together, and several projects have used this type of hybrid. For example, naive Bayes and kNN classifiers are combined for providing news

(21)

recommen-3.3. Hyperparameter Optimization

dations [36].

3.3 Hyperparameter Optimization

The hyperparameter is a model-specific parameter that is required for the model explicitly before it starts learning. Some common techniques for optimizing the hyperparameters to find the set of optimal values that result in the most skillful predictions are described in the following sections.

3.3.1 Grid search

Grid search is a straightforward search technique that evaluates the function over limited parameter space. This is a recommended approach for selecting the regularization parameter λin model-based collaborating RS. Although it is easily parallelized, it suffers from the curse of dimensionality, where it will be slow if used to optimize multiple parameters [14].

3.3.2 Random search

The grid search method is exhaustive and expensive, whereas the random search with a fixed limit of samples has shown to be more effective in high-dimension spaces. It is easily paral-lelized but still lacks proper guidance [14].

3.4 Evaluation metrics

Recommender systems are evaluated to see how well the system can generate results for unseen data (data that has not been used for training the model). There exist different metrics based on the type of algorithms used. These metrics are broadly classified as follows [25]

• predictive accuracy metrics indicate the closeness of the ratings estimated by a recom-mender system to the true user ratings (root mean squared error (RMSE), mean absolute error (MAE))

• classification accuracy metrics that measures the amount of relevant or irrelevant items that are given by the recommender system (precision, area under the curve (AUC)) • rank accuracy metrics which estimates the correct order of items concerning the user’s

preference also referred as the measurement of rank correlation in statistics (mean av-erage precision (MAP), Kendall’s tau, Spearman’s rho) [30]

Pilászy used RMSE as an evaluation measure for Netflix prize dataset [27]. Rendle has utilized the AUC score as an evaluation strategy for the movie recommender system [21]. Other metrics such as normalized discounted cumulative gain (NDCG) and Precision@k are used by Akimchuk to evaluate the top k music recommendations [2].

(22)

4 Method

In this chapter, the methods used to provide the recommended group activities to the users are described in-depth along with the explanation of how the results are evaluated.

4.1 Approaches used

Although there exist various approaches, the content-based recommendation method could not be used in this project. This is due to the non-availability of the description of all gym group activities in the available data. Hence, the model-based collaborative filtering methods are applied in this project since these models are efficient and highly scalable to large scale datasets [16].

4.1.1 Matrix Factorization

To deal with sparsity and scalability, matrix factorization methods are used. Matrix factoriza-tion is a process where the input data matrix is decomposed into the product of two lower-dimensional matrices (see figure 4.1). These two matrices represent user and activity latent feature matrices. These matrices are multiplied back to get the original matrix with approx-imate predicted values [3]. This method aims to fill the matrix of users to activities with the likelihood of a user visiting an activity. It provides the information on how well a user is aligned with a set of latent features and how much an activity fits into this set of latent features.

4.1.2 Latent Semantic Analysis

Furnas, Deerwester et al. [13] proposed the Latent Semantic Analysis (LSA) method to deal with the high dimensionality of the document-term matrix. This method is often used in natural language processing in which a matrix containing word counts per document (rows represent unique words and columns represent each document) is retrieved from a large corpus and singular value decomposition (SVD) technique is applied to capture latent asso-ciations between terms and documents. In collaborative filtering, this method can be used to form users’ trends from individual preferences, by finding latent relationships between users

(23)

4.1. Approaches used

Figure 4.1: Matrix factorization [28]

and activities. Also, this approach is used as a strong baseline in the top-k recommendation problem [6].

SVD uses a simple imputation technique such as replacing the missing entries of matrix R with zeroes [6]. The incomplete matrix R is transformed into the sparse matrix. Hence, a higher-level representation of the original user–activity matrix is produced, which contains the main trends of users’ preferences with reduced noise favoring scalability [13]. The ac-tivities and users are transferred to the same latent factor space, thus making them directly comparable [33]. If m number of users like activity i and if they also like activity j, then SVD will group them together in the same latent factor space to form an agglomerative activity or feature [9]. Hence, two users are compared by evaluating their visits to different activities. Mathematically, SVD decomposes matrix R into two unitary matrices and a diagonal matrix:

R=UΣAT (4.1)

R - m x n observation matrix

U - m x k orthogonal left singular matrix, which represents the relationship between users and latent factors

Σ - k x k diagonal matrix of singular values (non-negative elements), which describes the strength of each latent factor

A - k x n orthogonal right singular matrix, which indicates the similarity between activities and latent factors

The diagonal matrix is expressed in a general form [20].

Σ=      σ1 0 . . . 0 0 σ1 . . . 0 .. . ... . .. ... 0 0 . . . σk      k˚k (4.2)

The given latent factors are the characteristics of the activities, for example, the strength and mobility of the activity. The dimension of the observation matrix R is reduced by extracting its latent factors. It maps each user and each activity into an r-dimensional latent space. This mapping facilitates a clear representation of relationships between users and activities. A higher number of latent factors increases the model’s performance, but makes the recom-mendation algorithm slower [32] and more training data and iterations are required [18].

(24)

The value of k should be chosen such that the product matrix (with predicted values) can capture most of the variance within the observation matrix R (an approximation of R). The difference between the original (observation) and the final matrices is the error that is ex-pected to be minimized. To get the lower rank approximation, we take these matrices and keep only the top k features, which we think of as the k most important underlying taste and preference vectors.

The product equation 4.1 can be expanded as,

l ÿ i=1 Ý Ñ_u iσiÝÑaTi (4.3)

where l ” min m, n(, and ÝÑuiand ÝÑaiare the i-th columns of U and A, respectively. The sum

only goes to min m, n( since we know that the remaining columns of U or A will be zeroed out byΣ.

This method generates a low-rank approximation of the input matrix, and that low-rank representation can be used to generate insights. This method is used as a baseline and the performance is compared with other methods based on the evaluation of the results.

4.1.3 Alternating Least Squares

Alternating Least Squares (ALS) is an iterative optimization process where the factorization of two matrices arrives closer to the original user-activity matrix for every iteration.

The pseudocode [5] for ALS can be seen below. Algorithm 1ALS Algorithm

1: procedureALS(U, A) ŹMatrices representing user and activity feature matrix

2: Initialize matrix A with random values 3: repeat

4: Fix A, solve U by minimizing the objective function (the sum of squared errors)

5: Fix U, solve A by minimizing the objective function similarly

6: untilStop criteria is satisfied (convergence)

7: return U, A

8: end procedure

We have an original matrix R with dimension u ˚ a where u denotes the number of users and a denotes the number of activities. The observations are the number of visits (implicit feedback) denoted by ru,a. The original matrix is decomposed into a matrix U of users with

hidden latent user-factors vector xuand another matrix A of activities with the hidden latent

item-factors vector ya. In both U and A, the weights that relate user/activity to each factor

are distributed. The product of these matrices approximates the original matrix as close as possible. The notation will be given as R « U ˚ AT.

The random values are assigned in U and A matrices and the weights that yield the best approximation of the original matrix (R) are obtained by iteratively using least squares. The alternating least square method uses the same approach as least square but it is done iteratively with fixing activity matrix and optimizing user matrix and vice versa. For each iteration, the weights will arrive closer to the product of U and A matrices by minimizing the

(25)

least square cost function [32]:

min x˚,y˚ M ÿ u=1 N ÿ a=1 cu,a(pu,a´xTuya)2+λ M ÿ u=1 ||xu||2+ N ÿ a=1 ||ya||2 (4.4) cu,ais the confidence that the user likes the activity

pu,ais preference, a binary value specifying whether the user participated in an activity or not.

The preference denotes whether the users visited the activities at least once or not. The preference values are obtained by binarizing the ru,avalues. If ru,a ą0, then pu,a = 1 and if

ru,a=0, then pu,a =0. Sometimes, the user might not participate more in the activity due to

various reasons, for example, the user might like/dislike the specific instructor who provides training for a particular activity. Hence, I decided to retain the users even if they attended the specific activity only once. This could be handled by the confidence term in the model [32]. The confidence value (c) can be calculated as:

cu,a=1+α ˚ ru,a (4.5)

where α is the linear scaling factor. The α value is set such that both the observed and un-observed activities have roughly equal weight since all the zeros in the matrix (unun-observed instances) get default confidence of 1.

When the user attends the group activity for more time, the confidence will be larger given alpha (α) value. Hence, there will be some confidence considered for users with ru,a=1.

The regularization term of the cost function is given as:

λ M ÿ u=1 ||xu||2+ N ÿ a=1 ||ya||2 (4.6)

where λ is a L2 Regularizer used to prevent overfitting.

The cost function becomes quadratic when either the user or activity latent factors are fixed and so, its global minimum can be readily computed. This results in an alternating-least-squares optimization process, where the user factors and activity factors are re-computed alternatively.

The user factor xuthat minimizes the cost function is given as,

xu= (YTCuY+λA)´1YTCup(u) (4.7) The user factor computation is followed by the recomputation of all item-factors in a parallel way. The item factor is given as,

ya= (XTCaX+λA)´1XTCap(a) (4.8) After computing these hidden factors, we recommend to user u, the top available items with the largest value of the predicted preference ˆpua=xTuyaof user u for an activity a.

(26)

4.1.4 Bayesian Personalized Ranking

Bayesian Personalized Ranking (BPR) is a pairwise ranking optimization algorithm that involves pairs of activities to provide users with more personalized activities as recom-mendations. "Rendle et al. [21] presented a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian anal-ysis of the problem". This method learns personalized rankings i.e, one individual ranking per user and provides a user with a personalized ranked list of items that the user might want to buy.

Let U and A be the set of all users and activities available in the whole data S. The task of the recommender system is now to provide the user with a personalized total ranking ąuĂA2of all activities where ąudenotes the latent preference structure for user u deciding

which activity is more preferred over the other activity. The training data will be the activity pairs and it is optimized for accurately ranking activities. The scoring will be based on pairs instead of scoring single items (as in ALS).

If an activity a has been viewed by the user u (i.e. (u, a) P S), then it is assumed that the user prefers this activity over all other non-observed activities. But for the two activities that have both been visited by a user, the preference could not be inferred. Also, this will be the same for two activities that a user has not visited yet. To formalize this, the training data DS : U ˚ A ˚ A is created by: DS=t(u, a1, a2)|a1PA+u ^a2PAzA+uu (4.9) In 4.9, A+_u := " a P A :(u, a)PS *

and(u, a1, a2)P DS denotes that user u assumed to prefer

activity a1over a2.

As stated earlier, this method consists of the general optimization criterion for personalized ranking, BPR-Opt. The Bayesian formulation of finding the correct personalized ranking for all activities a P A is to maximize the posterior probability 4.10 whereΘ represents the parameter of the model that determines the personalized ranking. BPR-Opt will be derived by a Bayesian analysis of the problem using the likelihood function p(a1 ąu a2|Θ) and the

prior probability (a normal distribution with zero mean and variance-covariance matrixΣ_Θ)

for the model parameter p(Θ).

p(Θ| ąu) 9 p(ąu|Θ)p(Θ) (4.10)

where ąuis the desired but latent preference structure for user u,

p(ąu|Θ)- likelihood that captures the individual probability that a user really prefers

activ-ity a1over activity a2,

p(Θ)„N(0,Σ_Θ)(prior probability)

Let ˆrua1a2 be the estimator that captures the relationship between user u, activity a1and activ-ity a2, which can be further decomposed into:

ˆrua1a2 = ˆrua1´ˆrua2 (4.11) The equation 4.11 explains the difference between the predicted interaction between the pos-itive (preferred) activity a1and the negative (not preferred) activity a2(classifying the

(27)

differ-4.1. Approaches used

ence of two predictions). The maximum posterior estimator is now formulated as follows to derive the generic optimization criteria for personalized ranking BPR-Opt [21].

BPR ´ OPT:=ln p(Θ| ąu) =ln p(ąu|Θ)p(Θ) =ln ź (u,a1,a2)PDS σ(ˆrua1a2)p(Θ) = ÿ (u,a1,a2)PDS ln σ(ˆrua1a2) +ln p(Θ) = ÿ (u,a1,a2)PDS ln σ(ˆrua1´ˆrua2)´ λΘ||Θ|| 2 = ÿ (u,a1,a2)PDS ln σ(xuyTa1´xuy T a2)´ λ_Θ 2 ||xu|| 2 ´λΘ 2 ||ya1|| 2 ´λΘ 2 ||ya2|| 2 (4.12)

σused here is the logistic sigmoid:

σ(x) = 1

1+e´x (4.13)

λ_Θis the model specific regularization parameter

The logarithmic function is used to have numerical stability. For huge datasets, there is a possibility of getting very low probabilities that are difficult for the system to record. By taking a log of the probability, the summation of the probability of each point is obtained and would be efficient to store in computer memory.

The optimization criterion for personalized ranking is differentiable, and so, gradient descent-based algorithms could be utilized for maximization. Hence for maximization, LearnBPR, a stochastic gradient descent algorithm based on the bootstrap sampling of training triples (equation 4.9) is proposed by Rendle [21].

The gradients, which are partial derivatives of BPR-Opt (equation 4.12) with respect to the model parameters can be written as,

B Bxu = 1 1+e(xuyT_a1´xuyT_a2)¨(ya2´ya1) +λxu (4.14) B Bya1 = 1 1+e(xuyT_a1´xuyT_a2)¨ ´xu+λya1 (4.15)

(28)

4.1. Approaches used B Bya2 = 1 1+e(xuyT_a1´xuyT_a2) ¨xu+λya2 (4.16)

For each parameter, an update (equation 4.17) is performed.

Θ Ð Θ+α e ´(xuyT_a1´xuyT_a2) 1+e´(xuyT_a1´xuyT_a2)¨ B(xuyTa1´xuy T a2) BΘ +λΘ¨Θ ! (4.17) A stochastic gradient descent algorithm is used such that it chooses the triples randomly (uniformly distributed). With this approach the chances to pick the same user-activity com-bination in consecutive update steps is small. A bootstrap sampling method is used with replacement because stopping can be done at any stage.

4.1.5 Logistic Matrix Factorization

Johnson [18] presented the probabilistic model Logistic Matrix Factorization (LMF) to build a song recommendation system for music listening behavior from streaming music company Spotify®. This method is a new probabilistic framework for the implicit case in which we model the probability of a user choosing an activity by a logistic function. This model uses a similar approach as ALS such that factorizing the original observation matrix into two lower dimensional matrices but with a probabilistic approach.

Considering the event, where the user u has chosen to visit the activity a (u prefers a), then the probability of this event occurring is distributed according to a logistic function. This is parametrized by the sum of the inner product of user and activity latent factor vectors and user and activity biases [18].

p lu,a|xu, ya, βu, βa = exp(xuy T a +βu+βa) 1+exp(xuyaT+βu+βa) (4.18)

βu and βa terms represent user and activity biases which account for variation in behavior

across both users and activities. Some users will tend to visit a diverse assortment of activities while others will only interact with a small subset. Similarly, some activities will be very popular and so will have a high expectation of being visited by a broad audience while other activities will be less popular and only apply to a niche group. The bias terms are latent factors associated with each user u P U and activity a P A that are meant to offset these behavior and popularity biases.

This method also incorporates a ’confidence’ term for the observation matrix R such that c = αru,a where α is a tuning parameter given to the model. Hence, the confidence will

be high for positive observations in the observation matrix. In this project, we use the log scaling function such as c=1+αlog(1+rua)to remove the power user bias that comes from

the data where a small set of users contribute the majority of the weight. With an assumption of observations in R as independent, the likelihood is derived as [18]:

L(R|X, Y, β) =Πu,ap lua|xu, ya, βu, βaαrua 1 ´ p lua|xu, ya, βu, βa (4.19)

For regularization, priors placed on the user and activity latent factor vectors are set to zero-centered spherical Gaussian.

(29)

4.2. Evaluation

We arrive at the following equation by taking the log of the posterior and replacing constant terms with a scaling parameter λ.

log p(X, Y, β|R) =Σu,aαrua(xuyTa +βu+βa)´(1+αrua)log(1+exp(xuyTa +βu+βa))

´λ 2||xu|| 2 ´λ 2||ya|| 2 (4.21) Assuming independence of all elements in the observation matrix R, latent factors are opti-mized by maximizing the log posterior with the alternating gradient ascent procedure [11].

arg maxX,Y,βlog p(X, Y, β|R) (4.22)

In each iteration, we first fix the user vectors X and biases β and take a step towards the gradient of the activity vectors Y and biases β. Next, the activity vectors Y and biases β are fixed and a step towards the gradient of the user vectors X and biases β is taken. The number of iterations required for convergence could be decreased dramatically by choosing the gradient step sizes adaptively via AdaGrad (Adaptive Gradient Algorithm) [23].

4.2 Evaluation

4.2.1 Quantitative analysis

In order to measure the closeness of predictions made by the algorithm to users real prefer-ences, a numerical representation is needed [31]. The following metrics are used to evaluate our recommendation algorithms. These ranking based metrics aim to capture the quality of a specific ranking, taking into account that the user has expressed a preference towards some activities, typically those in the test set predicted above a certain threshold.

4.2.1.1 Precision

Precision measures how well an information retrieval system retrieves the relevant activities requested by a user. This is defined as follows,

Precision= Total number of activities retrieved that are relevant

Total number of activities that are retrieved

Precision@k measures the results at a certain position (k) to find how many results are rele-vant until that position. This metric measures the accuracy of only the top k ranked (highest scoring) list of activities for a given query. [2].

4.2.1.2 Mean average precision

"Mean average precision (MAP) is a popular metric for search engines. It takes each relevant item and calculates the precision of the recommendation set with the size that corresponds to the rank of the relevant item. Then the arithmetic mean of all these precisions is formed [25]".

The arithmetic mean of the average precisions of all users in a test set is calculated to get the final mean average precision:

MAP= Σ

M

u=1Average Precision (for all users)

M

This metric informs us how correct a model’s ranked predictions are, on average, over a whole test set.

(30)

4.2. Evaluation

4.2.1.3 Normalized discounted cumulative gain

Normalized discounted cumulative gain (NDCG) is a widely used metric for a ranked list. NDCG@k is defined as [22], NDCG@k= 1 IDCG˚ k ÿ i=1 2ri´1 log₂(i+1) (4.23)

IDCG is the maximum possible discounted cumulative gain through position ’k’. If the activ-ity at position i is a hit activactiv-ity, then riis 1 otherwise it will be 0. IDCG is picked such that the

NDCG value for the perfect ranking is 1. 4.2.1.4 Area under the curve (AUC)

AUC is a decision-support metric that cares only whether gym users like the gym group ac-tivity or not. It is the area under the curve of receiver operating characteristic (ROC) that plots true positive rate (TPR) against false positive rate (FPR) with the increasing of recom-mendation set size.

TPR= number of gym activities that users like and in recommendation list

number of gym activities that users like

FPR= number of gym activities that users do not like and in recommendation list

number of gym activities that users do not like

The larger area under the ROC curve indicates that we are recommending activities that end up being participated in near the top of the list of recommended activities. AUC score is calculated for each user in the test set that had at least one masked item and the mean AUC is calculated. A higher value of the AUC indicates a better quality [21].

AveragedAUC= 1

|U|ΣuPU(AUC(u))

AUC@k measure is particularly suited for top-k recommendations which are used in many e-commerce applications. This limited area under the curve measure combines classification and ranking accuracy to create a better measure for recommender systems [25]. "This mea-sure focuses on ordering the true activities in the top-k ranks [19]".

This measure returns one if all relevant activities are retrieved within the top-k list. It will be zero if all irrelevant activities are retrieved first and fit within the top-k list. A top-k list that contains more relevant activities will have a higher score than a list with less relevant activi-ties, except if the length of the list is close to the total number of activities. Here, the order of relevant and irrelevant activities within the recommendation list would have a higher influ-ence on the overall score. If a relevant activity moves towards the top of the list, the measure increases [25].

4.2.2 Qualitative analysis

This method focuses on getting information from communicating with the users to find out what the users think about the recommendation results. This will give an opportunity to understand the user better and further improve the solutions based on their answers to the whole picture. The main goal is not just to discover the opinion of the customer but also to understand their motivation.

The personal interview method is used for the qualitative analysis in which a gym member who also happens to be a BRP staff member has participated and volunteered to be used

(31)

4.3. Experimental setup

anonymously for illustrative purposes. Also, two other employees of BRP have participated and shared their feedback regarding the quality of the recommendation results. The feedback from the interview could be found in the Appendix (see table A.7).

4.3 Experimental setup

To perform the analysis, the language used is Python. The implementations are done by using the SciPy [38], scikit-learn [24] and implicit12_libraries.

There is no method that is consistently outperforming the other, therefore it is of importance to test several methodologies that utilize different optimization algorithms to find which is the most appropriate for the specific problem. Hence, all the above-described methods are tested and evaluated to find the method that performs the best for this specific dataset. The methods are evaluated using the holdout validation method [17] for evaluating the per-formance of activity recommendation. This method, also known as true validation, considers a pseudo-randomly chosen subset of the initial sample and uses it as the testing set. The re-maining observations are kept as the training data. For our application, it can be done in two ways,

1. Using data from a time series where all data prior to a certain point in time is used as a training set while data after a certain period of time is used for evaluation.

2. For each user, a certain percent of the existing user-activity interactions are held out (i.e, values are replaced by 0 assuming that the user has never interacted with those activities) and considered as a test set. The remaining interactions are considered for training the model.

Sometimes the user’s specifications might vary for different seasons. Also, some users will not be having gym subscriptions for all years. So, all of the user-activity interactions are required to find the proper matrix factorization. Therefore, the second approach is used for the train/test data split that is previously explained in the method chapter. Hence, 20% of the user-activity interactions for all users are held out for evaluating the recommended activities and each data point is assigned to the respective set randomly with a seed 0 used for reproducibility.

(32)

5 Results

In this chapter, the results and the analysis obtained for this project are presented.

5.1 Alternating least square method

Once the sparse data is prepared, equation 4.4 is applied to compute the user and activity factors by minimizing the cost function 4.7 and 4.8. The following hyperparameters are used to discover the best model parameters,

• f actors - latent factors for user and activity vectors

• iterations - number of iterations to use while training the model • regularization - regularization constant to be used in the cost function

(33)

5.1. Alternating least square method

Figure 5.2: ROC curve for a user (ALS)

All the hyperparameters are optimized using a test set via grid search (see table A.3 and A.4). The hyperparameter f actors is given different values and the best value is found when the values of AUC@k and MAP@k are high. The highest value could be around 30 since there is no improvement in performance beyond 30 for this dataset. From figure 5.1, it can be seen that latent factor = 10 could be better to use than any other values since AUC@k and MAP@k are higher at 10.

Along with the hyperparameter latent factor, the following values of iterations = 20, and regularization = 0.05 are chosen as the best to train the model. The results obtained from the model have been evaluated by the ranking metrics such as Precision@k, MAP@k, NDCG@k, and AUC@k with the test data. The AUC score is calculated for each user in the model. The ROC is plotted in figure 5.2 and we can infer that the recommendations for this specific user are more relevant since the area under the ROC curve is large with AUROC=0.99.

(34)

5.2. Bayesian personalized ranking method

5.2 Bayesian personalized ranking method

The hyperparameters for the BPR method are the same as those chosen in the ALS method. From figure 5.3, the latent f actors of value 15 seem to be the best based on AUC@k and MAP@k metrics. The following values of iterations = 20, regularization = 0.05 are chosen to train the model. Also, the other hyperparameter learning_rate is used in BPR which indicates the learning rate α applied for SGD (stochastic gradient descent) updates during training. This value is set to 0.01. The results obtained from the model have been evaluated by the ranking metrics.

Figure 5.4: ROC curve for a user (BPR)

From figure 5.4, we can infer that the recommendations for this specific user might not be fully relevant since the area under the ROC curve is not large enough with AUROC= 0.79. So, this specific user might not be satisfied with the recommendations provided to him/her.

(35)

5.3. Logistic matrix factorization method

5.3 Logistic matrix factorization method

The hyperparameters used for LMF method are latent f actor = 10, iterations = 20, regularization = 0.05, learning_rate = 0.01 based on the evaluation metrics given in fig-ure 5.5.

From figure 5.6, it is obvious that the recommendations will not be much relevant since the AUROC=0.44 which is less than 0.5. This method obviously works worse for this specific data. And this is evident from figure 5.8 with the evaluation metric scores.

Figure 5.6: ROC curve for a user (LMF)

5.4 Latent Semantic Analysis (using SVD)

Figure 5.7: ROC curve for a user (LSA)

The hyperparameter k (number of singular values and vectors to compute) for this method is chosen with the specification that it must be 1 ď k ă min(dimension of R)where R is a

(36)

5.5. Comparison of results

Table 5.1: Final hyperparameters chosen for all the methods Methods Parameters Values

ALS f actors 10 iterations 20 regularization 0.05 BPR f actors 15 iterations 20 regularization 0.05 learning_rate 0.01 LMF f actors 10 iterations 20 regularization 0.05 learning_rate 0.01 LSA k (singular values) 10

sparse observation matrix on which the SVD is computed. The chosen value for k via grid search is 10.

In figure 5.7, the AUROC is 0.56 which is a little higher than 0.5. But still, this will not be a good score and the recommendations for this user will not be much relevant. Table 5.1 shows the final hyperparameters that are chosen for all the above methods.

5.5 Comparison of results

Figure 5.8 presents the Precision@10 and MAP@10 scores for all the four methods. Preci-sion@10 score is only for each query. It denotes that 57 % of the “top-10” selected activities are relevant for each query in the ALS method. The other methods seem to have low pre-cision@10 values such as 39% for BPR, 25% for LSA, and 14% for LMF indicating that these methods failed to capture the relevant activities within the top 10 scores. Hence, we can consider that the ALS method works better by pushing more relevant activities in the top 10 list. These activities could be preferred by users.

(37)

5.6. Recommendation of activities

Table 5.2: Mean AUC score for all the methods

ALS BPR LSA LMF

0.9 0.82 0.61 0.53

Precision@10 does not really consider the order of the recommended activities. Hence, Mean average precision @ position 10 serves the purpose. So, both the ALS and BPR methods have closer values of 25% and 22%. This indicates that both methods provide relevant activities for all the users in the data, while LSA performs better than LMF with the value of 17%. Figure 5.9 presents a comparison between all the methods against each other based on the evaluation metrics NDCG@k and AUC@k to evaluate the top 10 activities that are recom-mended to the users. NDCG@10 aims to put highly relevant activities before medium rele-vant activities that should come before non-relerele-vant activities. Since it involves a logarithmic reduction factor, it penalizes the proportionality to the 10th position of the result. AUC score indicates the quality of the recommendations. Based on these metrics, the ALS method seems to be the best when compared to the other three methods indicating the retrieval of highly relevant activities in the top 10 list. The BPR method follows ALS with the next best score. Again, the other two methods (LSA and LMF) failed to capture these relevant results in the top 10 activities. Table 5.2 represents the mean AUC score calculated for each user in the test set specifying that the ALS and BPR methods work best by acquiring more relevant activities.

Figure 5.9: Comparison based on NDCG@10 and AUC@10

5.6 Recommendation of activities

Once the model is trained, the gym group activities are recommended to the users based on the predicted values from the model. The recommendations are provided based on various use cases as follows.

(38)

5.6. Recommendation of activities

5.6.1 Similar activities recommendation to a specific activity

In table 5.3 and 5.4, the top-10 activities that are more similar to the activity named ’GRIT® Strength’ for four methods are listed. GRIT® Strength13_{is mainly suitable for users who like}

to have good fitness and strength. It uses bodyweight exercises to maximize muscle strength.

Table 5.3: Top most recommendations for activities similar to GRIT Strength activity using ALS and BPR

ALS

GRIT® Cardio14

Underground 45 in English Utepass - Gratis för alla! Underground 60 in English Underground 60 - Två salar Grundprogram i gymmet BODYPUMP®4_{Intro 60} Core Intensive 30 TRX Bas 45 BPR GRIT® Cardio Superlördag - BODYPUMP® Underground 60 Underground Intro 60 Underground 45

’Superlördag - GRIT Strength’ GRIT® Strength Team Utepass - Grit® cardio 30 Core Intensive 45

Table 5.4: Top most recommendations for activities similar to GRIT Strength activity using LMF and LSA LMF Badminton Bas 55 SH’BAM®15₄₅ Yoga 60 in English Underground 60 Teampass Kundaliniyoga 60 Watt-test Step Basic 55 P.E. One 45 Yoga 45 LSA

Utepass - Core dynamic 45 Utepass - Soma® Move 45 SOMA® MOVE 45 P.E. One 45 Kettlebells Bas 45 Temapass - BODYPUMP® Underground Bodyweight 45 Superlördag - GRIT® Strength Underground 45

The similarity between ’GRIT Strength’ and all other activities is calculated by taking the dot product between all activity vectors and the transpose of GRIT® Strength activity vector. This will give the similarity score.

score=A ¨ AT_a (5.1)

The indices for the top 10 scores are obtained and the names of these corresponding indices are retrieved. This is displayed as the recommendations to the users.

From the results given in table 5.3, ALS and BPR methods seem to have almost similar (same characteristics) recommendations against each other. This is due to the reason that the activities like ’GRIT® Cardio’, ’Underground 45 in English’, ’Core intensive 30’, ’TRX Bas 45’ retrieved by ALS are strength-based activities. Also, the activities such as ’GRIT® Cardio’, ’Underground 60’, ’Underground 45’, ’Core intensive 45’, ’GRIT® Strength’ are also strength-based. Hence, these two methods captured the strength and high intense based activities for the search query ’GRIT® Cardio’.

13_{https://www.lesmills.com/workouts/high-intensity-interval-training/les-mills-grit-strength/} 14_{https://www.lesmills.com/workouts/high-intensity-interval-training/les-mills-grit-cardio/}

4_{https://www.lesmills.com/workouts/fitness-classes/bodypump/} 15_{https://www.lesmills.com/workouts/fitness-classes/shbam/}

Recommender System for Gym Customers

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Statistics and Machine Learning

2020 | LIU-IDA/STAT-A--20/024–SE

Recommender System for Gym Customers

Roshni Sundaramurthy

Upphovsrätt

Copyright

Acknowledgements

Contents

List of Figures

List of Tables

1

Introduction

1.1

Commisioner

1.2

Motivation

1.3

Aim

1.4

Ethical Consideration

2

Data

2.1

Data source

2.2

Data description

2.3

Data exploration

2.4

Data preprocessing

2.4.1

Feature selection

2.4.2

Data format

3

Related Research

3.1

Recommender systems in fitness industries

3.2

Approaches in Recommender Systems

3.2.1

Collaborative filtering RS

3.2.2

Content-based filtering RS

3.2.3

Hybrid RS

3.3

Hyperparameter Optimization

3.3.1

Grid search

3.3.2

Random search

3.4

Evaluation metrics

4

Method

4.1

Approaches used

4.1.1

Matrix Factorization

4.1.2

Latent Semantic Analysis

4.1.3

Alternating Least Squares

4.1.4

Bayesian Personalized Ranking

4.1.5

Logistic Matrix Factorization

4.2

Evaluation

4.2.1

Quantitative analysis

4.2.2

Qualitative analysis

4.3

Experimental setup

5

Results