Recommend Songs With Data From Spotify Using Spectral Clustering

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2021,

Recommend Songs With Data From Spotify Using Spectral Clustering

DANIEL BARREIRA

NAZAR MAKSYMCHUK NETTERSTRÖM

(2)

Abstract

Spotify, which is one of the worlds biggest music services, posted a data set and an open-ended challenge for music recommendation research. This study’s goal is to recommend songs to playlists with the given data set from Spotify using Spectral clustering. While the given data set had 1 000 000 playlists, Spectral clustering was performed on a subset with 16 000 playlists due to the lack of computational

resources. With four different weighting methods describing the connection between playlists, the study shows results of reasonable clusters where similar category of playlists were clustered together although most of the results also had a very large clusters where a lot of different sorts of playlists were clustered together. The conclusion of the results were that the data was overly connected as an effect of our weighting methods. While the results show the possibility of recommending songs to a limited number of playlists, hierarchical clustering would possibly be helpful to be able to recommend song to a larger amount of playlists, but that is left to future research to conclude.

(3)

Sammanfattning

Spotify, som är en av världens största musiktjänster, publicerade data och en öppen utmaning för forskning inom musikrekommendation. Denna studies m˚al är att rekomendera l˚atar till en spellista med den angivna data fr˚an Spotify med hjälp av klusteranalys. Fastän den publicerade datamängden hade 1 000 000 spellistor, utfördes klusteranalys p˚a 16 000 spellistor p˚a grund av brist p˚a beräkningskapacitet.

Med fyra olika viktningar p˚a grafen med spellistor, visar studien resultat av rimliga kluster där liknande kategori av spellistor var klustrade ihop. Däremot innehöll resultatet i de flesta fallen ett väldigt stort kluster med m˚anga oliaka typer av spellistor klustrades ihop. Slutsaten av detta var att den använda datan var alltför sammankopplad som en effekt utav de använda vägningarna. Även om resultaten visar att möjligheten finns att rekommendera l˚atar till ett begränsaat antal spellistor, skulle hierarkisk klustring möjligen vara till hjälp för att kunna rekomendera l˚atar till fler antal spellistor.

(4)

Acknowledgement

We would like express our sincere gratitude to our supervisors Emil Ringh and Parikshit Upadhyaya. Parik, without your help we would probably still have been stuck on the difference between unnormalized and normalized Laplacian. Emil, without you detecting some of our computational flaws and teaching us effective ways to find them we would probably still be doing simulations. Without the encouragement and continuous feedback from you two this project would have been a lot harder, thank you.

(5)

Authors

Daniel Barreira, barreira@kth.se

Nazar Maksymchuk Netterstr¨om, nazarmn@kth.se Degree Programme in Technology

KTH Royal Institute of Technology

Place for Project

Stockholm, Sweden

Examiner

Gunnar Tibert

Vehicle Technology and Solid Mechanics, KTH Royal Institute of Technology

Supervisor

(6)

1 Introduction

We live in an age where there is an overflow of data. All the data presents us humans with a spectrum of different problems as well as opportunities. Music is an important cultural part of our society, and it is one of the fields that can take advantage of the opportunities that arises with the data overflow. Both from the perspective as a musician and as a listener there are initiatives to be done. As a musician, you probably want your music to reach as many listeners as possible, and as a listener you want a diverse pool of music that is in your interest. All this leads up to the purpose of our project, a challenge presented by AIcrowd called the

”Spotify Million Playlist Dataset Challenge”[3].

The challenge is to create an automatic playlist continuation by recommending 500 songs, ordered by relevance in decreasing order. In this paper the challenge will be solved by using Spectral clustering.

Spectral clustering is a method to do clustering in the eigenspace of a graph

laplacian. Clustering is a method to analyze data that is widely used in many fields such as statistics, computer science and more. The aim in this project is to see if one can find clusters of playlists and use these clusters to not only give music that directly correlates with the user but also find songs from connections in the cluster.

Problem formulation

From a data set given by Spotify, our study will focus on analysing how effective spectral clustering is when recommending songs.

1.1 The data

The data set is sampled from over 4 billion public playlists on Spotify and consists of one million playlists. There are 2 million unique tracks present in the data by nearly 300 000 artists. The data set is collected from US Spotify users during the years 2010 and 2017.

(8)

Figure 1: An illustrative extraction of a part of playlist 661 from the data set.

Furthermore the data has roughly 66 million tracks in total and 2, 26 million unique tracks. The given data has a few different attributes. An example is shown in Figure 1 for a playlist with 58 tracks. It is shown what is given for every playlist, and for every track. The ones mainly used in this paper are the ones marked with a red ring.

1.2 Clustering

Clustering is a way of understanding information with dividing data into different groups. The point is to define connections between data points with similarity, and by proxy removing non-essential data also known as noise. By doing this one will be able to detect different patterns and thus being able to analyze the given data. The applications of clustering are numerous and it is a widely used method to start analyzing big sets of data with machine learning [8], [7].

There are a variety of clustering algorithms. To name some of them,

ε-neighborhood, k-means and Density-based clustering. In this paper the study revolves around Spectral clustering and how effective it is when clustering Spotify playlists.

(9)

The reason why there are lots of different methods of clustering is because of the variety of datasets. For example consider the different data point sets in Figures 2 and 3.

Figure 2: data set one Figure 3: data set two

As seen in Figure 2 and Figure 3, the structure of the data points are different. This means that some clustering methods will also perform better versus others. By using density-based clustering or spectral clustering one can get ”correct” results on both graphs shown in Figures 2 and 3 but, k-means clustering will only be simply implemented on the graph shown in Figure 2. To understand why, a detailed understanding of the different algorithms is needed. Before explaining the algorithms some theory is needed.

1.3 Limitations

The given data set consists of a lot of data. Due to lack of time and lack of

computational resources the entire data set could not be analyzed . Furthermore the challenge itself is not being done, and the study only shows how spectral clustering could work for the challenge and to understand the general structure of the data.

(10)

2 Method

As mentioned before the main method in this paper is Spectral Clustering. But before diving into the algorithm itself some preliminary material is presented, starting with graph theory.

2.1 Graph Theory

A graph G is defined as a collection of i nodes N = {n₁, . . . , n_i} and k edges E = {e₁, . . . , e_k}. A node represents a data point and an edge represents the connection/relationship between two nodes [5]. We write

G = (N, E). (1)

Furthermore there is such a thing as a directed and an undirected graph.

Undirected and directed graph

A graph is undirected when an edge between two arbitrary nodes n_i and n_j is the same without regard of the direction.

e_ij ∈ E ⇒ e_ji ∈ E. (2)

An undirected graph also is called a symmetric graph. For a directed graph the direction matters. The next step is to explain how nodes are conneted to each other via edges.

Connectivity of a graph

A graph is called connected when it is possible to walk from one node to every other node. Otherwise the graph is not connected and there are several number of sub graphs. This is defined as multiplicity M , where M ≥ 1. [4]

M (G) := multiplicity = number of sub graphs (3) The connectivity of a graph can be represented by different sorts of matrices.

(11)

Adjacency matrix and weights

The graph can be represented with a matrix, called, the unweighted adjacency matrix A_uw and is defined as following:

Auw(k, j) =







1 : if there is an edge between nodes k and j

0 : if no edge between nodes k and j (4)

Figure 4: Example of an unweighted graph G and the associated adjacency matrix A graph can either be unweighted (as seen in Figure 4) or weighted, meaning that the edges values vary. The weighted adjacency matrix A_w is then defined as:

A_w(k, j) =







w_kj : weight of edge (k,j)

0 : if no edge between (k,j) (5)

A weighted graph can now be written as following

G = (N, E, W ) (6)

where W is a set of weights {w₁, . . . , w_k}. From here one last definition needs to be made from graph theory.

(12)

Degree matrix

A degree matrix D is a diagonal n × n matrix, where n is the length of the

adjacency matrix, and is defined as the sum of each row in the adjecency matrix A

d_i =

n

X

j=1

w_i,j D =







d₁ . . . 0 . .. ...

d_n







(7)

where i, j = {1, . . . , n}.

2.2 Eigenvalue and Eigenvectors

Definition: For a square matrix B, λ is an eigenvalue and ν is the corresponding eigenvector if

Bν = λν (8)

Spectrum of Eigenvalue

For a square matrix B , the spectrum is the set of eigenvalues. If a symmetric matrix with the size n × n has n non-negative eigenvalues, then that matrix is called a symmetric positive definite matrix. [1]

Eigs in MATLAB

In this paper the MATLAB-tool eigs() is used to calculate the eigenvalues of our matrices. To use eigs, the matrix should be square and sparse. Eigs makes the calculations significantly faster than eig() for sparse matrices.

2.3 k-means algorithm

k-means clustering is an iterative method, commonly used to cluster a set of data points. Given k center points (called centroids) in a space, the method performs clustering and assigns all data points to a cluster defined by the nearest centroid.

There are challenges in using this method on some occasions, but it is still a fundamental core of Spectral clustering. In Figure 5 and Figure 6 one can see the clusters it detects for two different set of data, for two different cases.

(13)

Figure 5: k-means on graph 1 Figure 6: k-means on graph 2 After initializing k clusters, the distance from each data point to each centroid is calculated. Thereafter the centroid that the data point is closest to is used to define the cluster that the data point is assigned to. Thereafter the mean-value is

calculated for the positions for all data points in the different clusters. This gives a new position to the centroids in the next iteration. Same process is repeated until the centroids are stationary, meaning that they find an equilibrium. The algorithm is presented below in Table 1.

Algorithm: k-means algorithm

1. Randomly initialize k centroids c₁, . . . , c_k.

2. Calculate the distance from each data point to every centroid.

3. Assign each data point to its nearest centroid.

4. Calculate the mean value of the grouped data points in each cluster and make that the new positions for the centroid.

5. Repeat from step 2 to step 4 til the difference in positions for each centroid is below a given tolerance.

Table 1: k-mean algorithm

(14)

2.4 Spectral Clustering

Spectral clustering uses the spectrum of eigenvalues from the graph laplacian matrix to cluster the graph in question [5]. There are unique advantages when using the spectrum of the eigenvalues to cluster a graph. These advantages are going to be presented later.

The method implies calculating the eigenvalues and eigenvectors of a so called Laplacian matrix. Given a graph G = (N, E) with nodes {n₁, . . . , n_m} ∈ N , then the unweighted Laplacian matrix L of the graph G is a m × m−matrix defined as

L = D − A (9)

where D and A are the degree and adjacency matrices defined in (5) and (7), respectively. An important property of L is that the matrix is symmetric positive semi-definite. The spectrum of the Laplacian matrix is calculated to obtain underlying structures of the data of the graph. The number of connected

components is equal to the multiplicity of the 0 eigenvalue, i.e., eigenvalues with the zero value. If k different eigenvectors have eigenvalue λ_1,...,k = 0, the graph has k connected components. This gives information about the number of clusters. On the other hand, if the graph G is connected, λ₂ gives information about the

connectivity of the graph. The greater the value of λ₂ the stronger the connectivity.

In Figure 7, 8 and 9, examples to illustrate these properties are shown:

Figure 7: λ_1,2,3= 0 Figure 8: λ₂(G₁) > 0 Figure 9: λ₂(G₂) 0

The normalized Laplacians

To calibrate spectral clustering to different graphs there are strategies that involve normalizing the Laplacian matrix. One way of normalizing the Laplacian matrix is as following:

L_norm = D^−1/2LD^−1/2 (10)

where the degree matrix D is defined in (7). The properties of the graph laplacians have an impact on how the graph is partitioned, in other words, how the graph is

(15)

cut.[5]

Partitioning a graph

Consider a graph G = (N, E, W ), with i nodes. Consider also the k partitions {A₁, A₂, . . . , A_k}. Furthermore use the notation A^c for the compliment of the partition A. The problem in question is to find the partitions of the graph by minimizing the cut, defined as:

Cut (A₁, . . . , A_k) = 1 2

k

X

i=1

W (A_i, A^c_i) (11)

where

W (A, B) =

k

X

i∈A,j∈B

w_i,j.

This takes into account the weight of the nodes that are being cut. This is an attempt to cut in a way that the clusters whose weights of the edges connecting the different clusters are as small as possible. However this method does not take into account the number of nodes, nor the volume in the different clusters. The risk is then that the clusters strongly vary in size but also partitioning the clusters in a

”wrong way”. Figure 10 shows an example how the partitioning can be made.

Figure 10: The figure shows how the cut can be done on a graph with multiplicity 1. Cut 1 shows how the cut can be done if number of nodes is not being taken into account. While cut 2 shows how we probably want to cut the graph.

(16)

As shown, this method is not necessarily giving the partitioning that is being

sought-after. Other quantities are therefore needed. To make the size of the clusters more similar, there are two ways one can consider measuring sizes of the clusters.

Either by taking the number of nodes in a cluster |Ai| into account or taking the volume of a cluster vol(A) into account. This leads to minimizing either, the so called, RatioCut or NCut [5].

RatioCut(A1, . . . , Ak) = 1 2

k

X

i=1

W (A_i, A^c_i)

|A_i| (12)

where

|Ai| := the number of nodes in a partition Ai, and

N Cut(A₁, . . . , A_k) = 1 2

k

X

i=1

W (Ai, A^ci)

vol (A_i) (13)

where

vol(A) :=

k

X

i∈A

d_i

To summarize, minimizing RatioCut will encourage the clusters to have similar amount of nodes and minimizing NCut will encourage the clusters to have similar volume. Solving this minimization problem exactly is NP-hard but with the graph Laplacian, it is possible to approximate a solution. By using the unnormalized Laplacian L an approximation of the minimization problem of RatioCut is done and the normalized Laplacian L_norm can approximate the minimization problem of NCut. Proof and theorem of the connection of the eigenvectors of the Laplacian with the graph cut functions is given by Von Luxburg [5].

(17)

Spectral Clustering Algorithms

In Tables 2 and 3 two different algorithms for Spectral clustering are presented.

Algorithm with aproximation of RatioCut 1. Create a graph G = (N, E).

2. Compute the unnormalized Laplacian L = D − A.

3. Find the eigenvectors ν1, . . . , νk of the matrix L belonging to the k smallest eigenvalues and create a matrix U = [ν₁, . . . , ν_k].

4. Treat each row in U as a data point x₁, . . . , x_n and perform k-means clustering on the points into k partitions A₁, . . . , A_k.

Table 2: Spectral clustering with unnormalized laplacian

Algorithm with approximation of NCut 1. Create a graph G = (N, E).

2. Compute the normalized Laplacian L_norm = D^−1/2LD^−1/2.

3. Find the eigenvectors ν₁, . . . , ν_k of the matrix L_norm belonging to the k smallest eigenvalues and create a matrix U = {ν₁, . . . , ν_k}.

4. Create T = t_i,j and set t_i,j = d^−1/2ν_i,j. In other words, let T contain the normalized rows of U .

5. Treat each row in T as a data point x₁, . . . , x_n and perform k-means clustering on the points into k clusters A₁, . . . , A_k.

Table 3: Spectral clustering with normalized laplacian

Spectral clustering and k-means (test results for a small dataset)

Figure 11 shows a faulty partitioning using k-means clustering. This is a problem where k-mean clustering sometimes gives undesired results. Because of the way the conditions are initialized, the centroids can find an equilibrium in an unwanted way.

An easy and reasonable way to get around the problem is to do the k-mean clustering a few more times, and use the most common outcome (the undesired result is more unusual than the desired result).

(18)

Figure 11: An example of an undesirable convergence of k-means

Figure 12 presents an example of using k-means clustering, as done in Figure 6, but instead of using the graph G, the k-means is done in the eigenspace of the Laplacian L. The algorithm has partitioned 3 clusters in a desired manner. The set of points which are shown in Figure 3 are distributed in a far more complicated way than the set of points in Figure 2. By only doing k-means the partitioning of this data this is not easy, one would have to involve transformations [7] and have a deeper

understanding of the structure of the data to be able to accomplish the ”correct clustering”. With Spectral clustering instead, the partitioning is made significantly more effective.

Figure 12: Illustration of how Spectral clustering has ”correctly” clustered the data in Figure 3.

When clustering the data with three different sizes of circles with our spectral clustering script, the results were correct approximately 60% of the time. This

(19)

means that k-means (which is the final step in spectral clustering ) found equilibrium on wrong spots in the eigenspace, approximatly 60% of the time.

An important factor that affects the precision in k-means is the initialization of centroids. If a centroid is initialized far away, relatively to the data points, it has a chance of ending up without data points associated with it. In another case, more than one centroid has a high probability of initializing in the same cluster, which would also cause problems. By initializing random centroids in a more strategic way, instead of pure randomization, the precision increases drastically. One method of initializing starting points in a strategic manner is called k-means++.

k-means++

This algorithm proposes a specific iterative mathematical method of assigning starting positions for the centroids. By using this method the centroids are more evenly spread out through the graph [2]. Let P be the set of data points and define D(x) as the distance from a data point to closest centroid. The algorithm is found in Table 4:

Algorithm k-means++

1. Randomly initialize a centroid c₁ from P

2. For each datapoint compute D(x) to the centroid that is nearest.

3. Choose as the next centroid the point with the highest probability P₊₊

4. Repeat step 2 and 3 until k centroids have been initialized.

Table 4: Algorithm for k-means++

Where

P₊₊=X

x∈P

D(x)² Pk

x∈PD(x)² . (14)

(20)

39 61 200

32 68 200

100 100 100 100 100 100

39 61 200

100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 0 100 200 100 100 100 100 100 100 100 100 100 100 100 100 0 100 200

32 68 200

Table 5: Results with k-means

100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

Table 6: Results with k-means++

Table 5 and 6 represent solutions to Spectral clustering on predefined data points.

The data is modeled as 100 data points in each circular shaped cluster, as in figure 3. Each row of the table represents a simulation, and shows the number of points to each clusters found. The table on the left represent results from spectral clustering featuring k-means and the right table represent results from spectral clustering featuring k-means++. In table 6, the advantage of k-means++ becomes obvious.

(21)

2.5 Different approaches to weighting

The weighting of the graph is a crucial part in this study. The weighting will determine how the playlists are clustered and that will directly lead to a resulting list of recommended songs.

Weighting 1: Percentage of similar songs between two playlists

This weighting method goes as follos: the number of songs that belong to both playlists are found and divided by the length of the longest playlist. The similarity is based from the perspective of the big list, i.e if list B has 100 songs and list A has 50 songs, their similarity is 0.5 and not 2.0. With this method there are values between 0 and 1. The formula is seen below, where A and B are arbitrary playlists with their tracksURIs. Length(M ax(A, B)) is the length of the longer list and similarity(A, B) is the number of common URIs.

w_AB = similarity(A,B)

Length(Max(A,B)) (15)

Furthermore the weighting also included some constraints. The constraints were split up into two different segments. The first constraint was that data under a certain threshold did not get added to the adjacency matrix, for example the filter could be that the lists needed to have more than 20% in common for them to be added to the adjacency matrix. The second constraint was applied in a second stage. First of all, every row and column with only 0’s in it were removed, meaning they did not have any connections, were removed. Second of all, the rows and columns that had less or more than a specific number of edges were removed, for example if a row only had 1 connection to another playlist it was removed. This was done in hope of reducing the number of small clusters and outliers.

Weighting 2: Percentage of similar artists between two playlists In the same manner as weighting function 1, this was used based upon the data given from the data set. The percentage similarity was determined using (15), but with A and B containing artist URIs instead of track URIs. This function was also subject to constraints in the same way as in weight function 1.

(22)

Weighting 3 and 4: A constructed function

The third and fourth weight function is given by Figure 13, and it is a curve of (16)

w_3,4 = 3w − 6w²+ 4w³ (16)

Figure 13: A plot of the function showing how a low percentage similarity gets increased to a higher similarity and how a high percentage similarity gets decreased to a lower.

This weight function is done with the goal to create more connections with a closer value to each other. As seen in figure 13 the function amplifies low values and devalues high values. The function is done for both artist and tracks, hence it is both the third and fourth weight function.

(23)

2.6 Recommending songs from clustered graph

Eigenvalues and eigenvectors were computed with the function eigs in MATLAB. As mentioned before, we take every eigenvector corresponding to eigenvalues close to the value 0, and thus cluster the eigenspace. In the reduced space, each row still corresponds to one playlist in the same order. So when the clustering algorithm is done, one can find the index for the given row, and go back to the original

JSON-data and withdraw essential information such as, song name corresponding to a track URI, artist name corresponding to artist URI and the name of the playlist.

When it comes to recommending a song from a cluster, a random number is generated from the size of the cluster. This number is then the row in the cluster you look at, and thus also a playlist. If it is the same playlist as the one that is getting songs recommend to it, a new number is randomized. If it is another

playlist, another number is randomly generated from the size of songs in the playlist to suggest a song. If the song already exists in the the playlist you are

recommending songs to, then we randomize a new number. This algorithm is then repeated as many times as there are songs left in the cluster, or until the set amount of recommended songs is reached.

(24)

3 Results

3.1 Results from weighting 1: Percentage of similar song

Results from unnormalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix Nr. cl = number of clusters, max(cl) = maximal cluster size

TH E.f. Size(A) Nr. cl max(cl) Comments

20% 4702 324 3578 One big cluster, one with size 145

a few with size 10 to 12 and the rest less than 7 20% <2 3236 60 2849 One big cluster, one with size 136

few bigger than 10 and the rest less than 5.

20% <5 1878 16 1292 One big cluster, one with size 367 one 95, one 57 the rest less than 11 30% 1399 259 193 20 clusters greater than 10. biggest is 193

240 clsuters < 10, of which 200 clusters < 3 30% <2 761 62 153 The span of biggest clusters are

{153, 83, 70, 60, 49, 4740} the rest are less than 5 30% <5 315 19 93 The span of cluster size

{93, 72, 26, 24}, 7 clusters smaller than 5

40% 465 116 67 Span of biggest clusters{67, 45, 20, 17, 10, 10, 10}

rest of clusters less than 3.

40% <2 210 27 59 The span of clusters {59, 36, 18, 14, 10}

20 clusters less than 4

40% <5 84 6 40 This gave cluster sizes: {40, 19, 15, 6, 2, 2}

50% 192 58 25 This gave the span: {25, 9, 8, 6, 5, 5, 5}

43 cluster are less than 4

50% <2 82 17 20 The cluster sizes were {20, 8, 8, 6, 4}, 8 clusters that are less than 4

50% <5 19 4 8 This gave 4 clusters with the sizes {8, 7, 2, 2}

Table 7: Results from 16000 Spotify playlists.

(25)

Results from normalized Laplacian

TH = Threshold, E.f = Edge filtering, Size(A) = size of the adj. matrix N.r. cl = number of clusters, max(cl) = maximal cluster size

20% 4702 324 3578 Clusters and cluster sizes identical to unnormalized laplacian

20% <2 3236 60 2849 Same as above

20% <5 1878 16 1292 Same as above

30% 1399 259 193 Same as above

30% <2 761 62 153 Same as above

30% <5 315 19 93 Same as above

40% 465 116 67 Same as above

40% <5 84 6 40 Same as above

50% 192 58 25 Same as above

50% <2 82 17 20 Same as above

50% <5 19 4 8 Same as above

Table 8: Results from 16000 spotify playlists.

(26)

Closer look at a few selected clusters

The selected clusters are from a threshold of 30% and no edge filtering. They are presented as the playlist name they have in the data. There are 259 clusters in total of which less than 8% are greater than 10 in size. The first cluster, which is

presented in Table 9, is one with a size of 19, this is also one of the larger clusters obtained using this particular weight function. The second one, in Table 10, is also one that is quite large, relatively to the other clusters, and it has size 7.

Latin Trap’ latino ’ Cecilia’ Lily’ Nueva’ Latin Vibes’

Spanish’ spanish’ Regeaton’ Spanish’ Party’ reggaeton’

Fiesta latina’ Mi Gente’ musica favorita’ Spanish Mix’ Gucci’

Latin Vibes’ BEANS’

Table 9: Playlist names in a cluster

Yeezy’ yeezy ’ Kenye West ’ Kenye West’ Yeezy Taught Me Kanye Kanye

(27)

3.2 Results from weighting 2: Percentage of similar artists

Results from unnormalized Laplacian

20% 14567 44 14463 One very large cluster, second largest is size of 6 20% <2 13780 5 13767 Same as above

20% <5 12340 1 12340 One single cluster 30% 12010 114 11738 One very large cluster,

second biggest is size of 7

30% <2 10483 10 10449 One gigantic, the rest less than 5 30% <5 8438 1 8438 One single cluster

40% 8410 210 7881 One very large cluster, the rest are less than 5

40% <2 6648 25 6565 Same as above

40% <5 4790 6 4765 Same as above

50% 4964 244 3877

The biggest clusters have sizes:

{3877, 211, 100, 60}.

The vast majority is less than 14 50% <2 3591 41 3098 Clusters have sizes: {3098, 181, 107, 59}

the rest are less than 9

50% <5 2309 15 1941 Clusters have sizes: {102, 98, 67, 44, 14}

The rest are less than 10

(28)

Results from normalized Laplacian

20% 14567 44 14463 Same partitioning as with the unnormalized Laplacian 20% <2 13780 5 13767 Same partitioning as with

the unnormalized Laplacian 20% <5 12340 1 12340 Same partitioning as with

the unnormalized Laplacian

30% 12010 114 2293 Biggest cluster is 8541. 3 clusters between 100-300, few bigger than 20 and the rest less than 10

30% <2 10483 10 10449 Same partitioning as with the unnormalized Laplacian 30% <5 8348 1 8348 Same partitioning as with

the unnormalized Laplacian 40% 8410 210 7881 Same partitioning as with

the unnormalized Laplacian 50% 4964 244 3877 Same partitioning as with

the unnormalized Laplacian

(29)

The selected clusters are obtained using a threshold of 50% and no edges filtered.

The clusters we are taking a closer look at is number 127 with a size of 68 and number 70 with a size of 14. Cluster 127 is the third largest one, and cluster 70 is in the top 10% largest ones, relative to the other clusters.

Classical’ Movie Soundtracks’ movie scores’ movie themes’ Soundtrack’

Movies’ Harry Potter’ Symphonic’ Scores’ Chill’

homework’ Classical’ Orchestra’ movie music’

Disney’ Disney Jams’ Disney’ Tangled’ Disney’

Disney’ Disney’ Disney’ Disney’ Disney’

Princess’ disney’ Disney’ Disney’ Disney’

Disney Music!!!!!!’ Disney’ Disney’ disney’ DISNEY ’

Disney/Pixar’ Disney’ disney’ Disney Disney’

Disney disney’ disney playlist.’ Disney Music’ Disney :) ’

DISNEY’ disney’ Disney’ Disney ’ Disney’

Disney ’ Disney Classics’ disney’ Disney’ babies ’

Disney’ Disney Favs’ Disney’ Disney’ disney’

Disney’ Disney’ Disney Princess’ Disney Best of Disney’

disney’ Disney’ Disney’ DISNEY JAMS ’ DISNEY’

Disney’ Disney Jams’ Disney’ Disney’ Disney!’

disney songs’ Disney’ Disney’ hakuna matata’ Disney’

Disney’ disney’ Olivia’

(30)

3.3 Results from weighting 3: A constructed function

TH E.f Size(A) Nr. cl max(cl) Comments

20% 13689 120 13416 The second largest cluster is size 5 20% >9 2739 569 212

A big variation on the cluster size top 11 clusters

{212, 103, 99, 80, 58, 58, 46, 42, 31, 36, 24}

20% >14 4091 394 2905 Cluster sizes after the biggest have sizes {28, 25, 25, 20}, 360 clusters less than 5 20% >6 1802 561 33 Biggest cluster is size 33.

543 clusters are less than 10

30% 9984 225 9419 One very large, three clusters bigger than 10 The rest of the cluster are less than 6

30% >9 3710 624 1461 One very large, a few with size between 10 − 69 480 clsuters less than 4

30% >6 2743 733 135

Clusters have sizes: {75, 34, 30, 26, 25}

694 clsuters les than 10 and 443 clusters less than 3

30% >14 4943 429 3621 Clusters have sizes: {49, 43, 20, 20}

289 clsuters less than 4

40% 5249 308 4125 One very large cluster, thereafter {164, 65, 26}

252 clusters are less than 4 . 40% >9 3091 511 965 One very large cluster, second biggest 125

a few between 20-90 and 305 cluster with size 2 40% >6 2438 624 77 Largest cluster is size 77, thereafter {60, 34, 28, . . . }

343 cluster with size 2 40% >14 3768 388 2412 One very large cluster,

353 cluster that are less than 3 Table 15: Results from 16000 spotify playlists.

(31)

Table 16 and Table 15 show clusters that are obtained using a threshold of 30% and filtering playlists that have more than 6 edges.

Awesome Playlist’ Country’ Zoned’ greek’ Dark Side’ summer country’

electro’ Rock’ Lindsey Stirling’ smiles :)’ Pool’ Black’

woo’ Relaxing ’ Spring 2017’ 90s Rock’ pump up’ Chill’

Gaming Songs’ jjj’ energy’ cool beans’ Perfection’ 80s’

This is what you came for Party playlist Me Eurodance Gaming Supernatural

Lit Sunshine Drive Ay ALT Rock

(32)

3.4 Results from weighting 4: A constructed function

TH E.f Size(A) Nr. cl max(cl) comments

20% 15837 5 15829 One very large cluster

the other 4 clusters have size of 2 20% >9 356 115 12 Size of biggest clusters {12, 11, 9, 8 . . . }

20% >6 173 65 7 Size of biggest clusters {7, 5, 5, 4 . . . } 41 clusters with the size of 2

20% >14 692 142 47 Size of biggest clusters {47, 36, 31, . . . } 13 clusters with size less than 5

30% 15621 11 15601 One very large cluster the rest are size 2

30% >9 913 217 101

Sizes of biggest clusters {101, 30, 25, 20, . . . } 186 clusters with size less than 5,

120 clusters with size 2

30% >6 528 181 9 All clusters have almost similar size 30% >14 1550 208 826 One very large cluster, second biggest 22

40% 14817 44 14721 One very large cluster, second biggest size of 4

40% >9 1810 388 194 One large cluster, 221 clusters less than 3 40% >6 1086 345 33 Sizes of biggest clusters {33, 22, 15, 14, 13, . . . }

271 clusters that are less than 4

40% >14 2906 323 1770 Sizes of biggest clusters {87, 45, 31, 30, 19, . . . } 254 clusters with the size of 3 or less

(33)

Table 19 and 20 show cluster that are obtained using a threshold of 40% and filtering of nodes that have greater than 9 edges.

JAMS’ Love Music’ basic’ RUNNIN’ ”emoji music note”

electronic’ Litty ’ Cruisin’ modern rock’ vibes’

pregame’ Happy Happy Happy’ Blues’ PARTY ’ classic’

4th of july’ 2016’ english’ Classical’ Summer 15’

Beach Music’ rock’ 90s Rock’ Random!’ childhood’

skrt skrt’ dance’ broadway’ sad song’ Way Back When’

lift’ In the Name of Love’ TX Country’ Bruno Mars Summertime

TX Country RECENT Swing

Solitude’ Spanish’ randoms’ Julion alvarez’ *** good stuff’ june’

Workout’ Relax’ Piano Guys’ Brown Eyed Girl’ wedding playlist’ Country’

MVP ’ Fall’ ThrowBack Pop ’ Hawaii ’ gabrielle ’ Table 20: Playlist names in a cluster

3.5 Results from random samples

The extraction of playlists from the data set has been done on consecutive data, which means for the analysis of 16 000 playlists, the first 16 000 playlists from the data set was extracted. A test has been done where three different sets of 16 000 random playlists from the data set have been extracted and our algorithm with weighting w₁ and w₂ have been implemented. The results show identical cluster-size differences, where the resulting clusters are one very big cluster and the rest with smaller size. These tests gives us no reason to believe that the given data set is in some way ordered by the publisher or that our extraction is an outlier.

(34)

3.6 Song recommendation to a playlist

In Table 21 a full playlist with 39 Disney songs is shown. This playlist is randomly chosen from the smaller clusters to show a practical example of the song

recommendation procedure to a playlist. The algorithm had a potential to recommend 920 songs to the playlist, where 48 of them are suggested as seen in Table 22.

A Disney playlist

Roger Bart’ Lillias White’ Bruce Adler’

Go the Distance I Won”t Say Arabian Nights’

Brad Kane’ Lea Salonga’ Jonathan Freeman’

One Jump Ahead’ A Whole New World’ Prince Ali (Reprise)’

Lea Salonga’ Donny Osmond’ Harvey Fierstein’

Reflection I”ll Make a Man Out of You A Girl Worth Fighting For

Jason Weaver’ Carmen Twillie’ Jeremy Irons’

I Just Can”t Wait t... Circle Of Life Be Prepared

Nathan Lane’ Jodi Benson’ Samuel E. Wright’

Hakuna Matata’ Part of Your World Under the Sea

Chorus Angela Lansbury’ Robby Benson’

Belle’ Be Our Guest Something There’

Angela Lansbury’ Mandy Moore’ Donna Murphy’

Beauty and the Beast’ When Will My Life Begin Mother Knows Best

Mandy Moore’ Mandy Moore’ Judy Kuhn’

I”ve Got a Dream I See the Light Just Around The Riverbend’

Phil Collins’ Phil Collins’ Phil Collins’

Two Worlds’ You”ll Be In My Heart’ Son Of Man’

Rosie O”Donnell’ Phil Collins’ Heidi Mollenhauer’

Trashin” The Camp’ Strangers Like Me’ God Help The Outcasts’

Tony Jay’ Kristen Bell’ Kristen Bell’

Heaven”s Light Do You Want to Build a Snowman?’ For the First Time in Forever’

Kristen Bell’ Idina Menzel’ Kristen Bell’

Love Is an Open Door’ Let It Go For the First Time in Forever

Ne-Yo’ Phil Collins’

Friend Like Me You”ll Be In My Heart’

Table 21: The songs with the associated artists from a random playlist to which songs are recommended to.

(35)

48 SUGGESTED SONGS

Maia Wilson’ Cheryl Freeman’ Opetaia Foa”i’

Fixer Upper’ The Gospel Truth I We Know The Way

Jesse McCartney’ Fess Parker’ Phil Collins’

When You Wish Up...’ The Ballad Of Davy Crockett’ On My Way’

Miley Cyrus’ Tony Jay’ Sarah McLachlan’

Butterfly Fly Away’ The Bells Of Notre Dame’ When She Loved Me’

Adriana Caselotti’ Auli”i Cravalho’ Bryan Adams’

Whistle While You Work’ How Far I”ll Go’ You Can”t Take Me

Ken Page’ Angela Lansbury’ Alessia Cara’

Oogie Boogie”s Song’ Human Again’ How Far I”ll Go

Beth Fowler’ Keith David’ Jenifer Lewis’

Honor To Us All Friends on the Other Side Dig A Little Deeper

Judy Kuhn’ The Cast of M. Keali”i Ho”omalu’

Colors Of The Wind’ One of Us’ He Mele No Lilo’

Jump5’ Rhoda Williams’ Louis Prima’

Aloha, E Komo Mai The Music Lesson I Wan”Na Be Like You

Mary Costa’ Jeremy Jordan’ Adam Mitchell’

An Unusual Prince The World Will Know’ Days In The Sun’

Bruce Reitherman’ Anna Kendrick’ Pocahontas’

The Bare Necessities’ No One Is Alone’ Where Do I Go From Here’

Auli”i Cravalho’ Bobby Driscoll’ Shakira’

Know Who You Are’ Following The Leader’ Try Everything - From

Cedar Lane Orchestra’ Jemaine Clement’ Dr. John’

The Lion King’ Shiny’ Down in New Orleans’

Adriana Caselotti’ Elvis Presley with Orchestra’ Tony Jay’

Some Day My Prince ...’ Suspicious Minds’ Out There’

Samuel E. Wright’ 98’ *NSYNC’

Kiss the Girl True To Your Heart’ Trashin” The Camp

Mark Mancina’ Ferb’ Richard White’

Village Crazy Lady Backyard Beach’ Gaston

Jim Cummings’ Elton John Rachel House’

Gonna Take You There Can You Feel The Love Tonight I Am Moana

(36)

4 Discussion

As seen from almost every table in the results, there is a general theme of one big cluster, and then a large number of small ones. This pattern is especially visible when data points with less than 2 and 5 edges are removed from the first and second weight function. Sometimes there is even only one cluster present after the filtering is done. From this the conclusion was made that the problem is not that the data is not connected enough. Rather, the analysis made from this is that the data might be overly connected, thus the decision was made that for the third and fourth weighting we try and filter out data with top many connections.

The code is written in such a way that from the start it removes 0-rows and

0-columns, and furthermore this also shows in good way how many playlists have 0 matches after the threshold-filter. It is obvious from Table 7 that it is quite rare that a playlist has 20% in common with another playlist, and looking at the 50%

threshold data, we see that the matrix is reduced into a 192 × 192 matrix from a 16000 × 16000, which leaves the undesired result of filtering out 98.8% of the data.

What is important to note is that the big cluster is a partition of all the data that has a high connectivity, meaning that suggesting a song to another playlist within the cluster is almost equal to randomizing a song. When trying the normalized Laplacian an identical result is obtained, and thus the conclusion is made that for w1

there is no difference when approximating the minimization of NCut or RatioCut.

The second weight function shows that the matrices are larger in size, which is no surprise, because as stated in section 1.2 there is a big difference in the number of unique artists and unique tracks over the data set. Even considering this, it is worse in some aspects. Consider the point earlier made that the big cluster is almost equal to randomizing a song from the data set, and assuming that the second largest cluster is reliable. It is then possible to put a number on how many playlists in a given cluster that are able to receive a song recommendation. Below is a summation of how many playlists that are eligible for a song recommendation.

Filter 20% <2 <5 30% <2 <5 40% <2 <5 50% <2 <5

w1 1124 387 586 1206 608 222 398 151 44 167 62 11

w2 104 13 0 272 34 0 529 83 35 1087 493 368

Table 23: A table of how many playlists are able to receive a song recommendation In Table 23 you are able to see how many playlists that are (in a best case scenario) eligible for recommendations. It becomes obvious that all of the thresholds with and without edge filtering are lackluster. At a first glance, doing the weight function

(37)

with artists seemed better because of more connectivity, but Table 23 shows that w₁ is almost better for every threshold. In the results part we present a closer look at a few clusters to get a grasp of how the clusters looks like. It then becomes visible that for the clusters that out algorithm does find, those are the clusters that have things in common and therefore we consider them as good candidates to be the source of a recommendation for a list within the cluster.

As expected when using w₃ and w₄ the dimensions of the matrices became larger.

The weight function itself is amplifying edges with a low value, and reducing values that are on the high end, as seen in Figure 13. From the results of w₃ and w₄, one can see that there are a large number of clusters when the filtering is done for data points with more than 6 , 9 and 14 edges. Looking at w₄, having a 20% threshold and not filtering any edges, the matrix size is 15 829. Increasing the threshold another 10% decreases the size to 15 621. Considering that the optimum would be being able to suggest song recommendations to every playlist, these sizes are desirable. The problem comes when looking at the cluster sizes, and furthermore looking at the largest cluster, showing the same tendencies as w₁ and w₂. To resolve these tendencies we filtered out playlists that we defined as overly connected with the intent to maybe break up the larger clusters.

Filter 20% >6 >9 >14 30% >6 >9 >14 40% >6 >9 >14 w3 273 2739 1802 1186 565 2249 2126 1322 1124 2438 2126 1356

w4 8 173 356 692 20 528 913 724 96 1086 1616 1136

Table 24: A table of how many playlists are able to receive a song recommendation This table is done in the same manner as for w₁ and w₂, the difference is that sometimes the largest cluster is not too big, and thus not being subtracted when calculating how many playlists that can receive a suggestions. Interesting to note is that the same pattern is seen as above, that tracks behave in a more desired way than artists. Furthermore as seen in table 18 the intent of filtering over-connectivity also lead to more clusters, and because of this it is also easier to suggest songs to a given playlist. This is illustrated by the fact that the algorithm went up from being able to suggest songs to 1 206 playlists at best to 2 739. But after doing a closer look at clusters, as seen in Table 16, 17, 19 20 the clusters seem more random and are not as coherent as they were for w₁ and w₂. The conclusion drawn from this is that artificially modifying the weighting function does not bring a desirable result.

(38)

4.1 Conclusion

One of the most important conclusions is that trying to manipulate the data structure with filtering different number of edges is not the way to go. First of all, every time an edge filtering is done, that also means that a playlist is removed, and because the optimum goal is to be able to suggest songs to every playlist, this is a bad solution. Secondly, when the filtering is done for playlists that we defined as overly connected, the clusters became more random and thus also a worse source for song recommendation. When applying the simple weightings w1 and w2 without filtering edges and ignoring the large cluster, we get results which are desirable.

When looking at the largest cluster, the conclusion is that some playlists are too connected and needs to be dealt with in an alternative way. How well the presented method would fare in the challenge is unknown, but after taking a glance at the recommended songs in 3.6 it would seem that the recommendations are reasonable.

To conclude, this study has shown that recommending songs using Spectral clustering might be a viable option, but further research has to be done.

4.2 Future work

One of the most important parts to note is the fact that every simulation presented in this paper is done on a 16 000 × 16 000 matrix. There is no way to deduce if the result would be different using the intended 1 000 000 × 1 000 000 matrix, so here there is naturally room for further testing. As stated in the limitations section we decided to not pursue larger sets of data due to limitations in computational resources. Furthermore there are unlimited different ways to do the weight

functions. In our case we picked one that is the simplest, natural and logical, and then used another one that we deduced might address the problems we had detected in the first one. That weight function did do better in the sense of being able to suggest songs to more playlists, but there were still a lot of small clusters and thus a limitation on how many songs one could recommend to some of the playlists. There was also the problem of the clusters being less structured than they had been with the original weight functions. A weight function that might not make the data overly-connected but still finds meaningful clusters would be interesting. Ultimately we note that there are methods to solve the problem of one large cluster. One of them is called hierarchical clustering [6], and in a sense it means that further clustering is done on the biggest clsuter.

(39)

References

[1] Howard. Anton and Robert C. Busby. Contemporary Linear Algebra. Hoboken, NJ : Wiley, 2003. isbn: 0471163627.

[2] David Arthur and Sergei Vassilvitskii. “K-Means++: The Advantages of Careful Seeding”. In: SODA ’07 (2007), pp. 1027–1035.

[3] Ching-Wei Chen et al. “Recsys Challenge 2018: Automatic Music Playlist Continuation”. In: RecSys ’18 (2018). doi: 10.1145/3240323.3240342.

[4] Elias Jarlebring. “Numerics for graphs and clustering”. In: Lecture notes numerical algorithms for data science (SF2526) (2019), pp. 8–9.

[5] Ulrike von Luxburg. “A Tutorial on Spectral Clustering”. In: Statistics and Computing 17(4), (2007). url: https://arxiv.org/abs/0711.0189.

[6] Frank Nielsen. “Hierarchical Clustering”. In: Feb. 2016. isbn:

978-3-319-21902-8. doi: 10.1007/978-3-319-21903-5_8.

[7] Jake VanderPlas. Python Data Science Handbook: Essential Tools for Working with Data. eng. Sebastopol: O’Reilly Media, Incorporated, 2016. isbn:

1491912057.

[8] “What is Clustering”. In: Machine Learning Crash Course (2020). url: https:

//developers.google.com/machine-learning/clustering/overview.

(40)

Recommend Songs With Data From Spotify Using Spectral Clustering

Recommend Songs With Data From Spotify Using Spectral Clustering

DANIEL BARREIRA

NAZAR MAKSYMCHUK NETTERSTRÖM

Abstract

Sammanfattning

Acknowledgement

Authors

Place for Project

Examiner

Supervisor

Contents

1 Introduction

1.1 The data

1.2 Clustering

1.3 Limitations

2 Method

2.1 Graph Theory

2.2 Eigenvalue and Eigenvectors

2.3 k-means algorithm

2.4 Spectral Clustering

2.5 Different approaches to weighting

2.6 Recommending songs from clustered graph

3 Results

3.1 Results from weighting 1: Percentage of similar song

3.2 Results from weighting 2: Percentage of similar artists

3.3 Results from weighting 3: A constructed function

3.4 Results from weighting 4: A constructed function

3.5 Results from random samples

3.6 Song recommendation to a playlist

4 Discussion

4.1 Conclusion

4.2 Future work

References