Comparing probabilistic models for human motion patterns

(1)

Bachelor Thesis

Comparing probabilistic models for human motion

patterns

Andreas Berg & Johan Feinestam

Supervisors: Martin Magnusson

Tomasz Piotr Kucner Chittaranjan Swaminathan

(2)

(3)

Acknowledgements

This work was performed at Örebro University, supported by the AASS through the project Comparing probabilistic models for human motion patterns. We would like to start by thanking our supervisors at the AASS, Martin Magnus-son, Tomasz Piotr Kucner and Chittaranjan Srinivas Swaminathan for edu-cating us within the appropriate research areas and for giving us many good advises. We also want to thank Johannes Stork for help with the Gaussian Processes

Finally a big thanks to Stephanie Lowry for examining this thesis.

(4)

(5)

Abstract

Recent years more and more works are being automatised or made more ef-ficient with the help of robots. In a world where more and more things are digitalized, robots must be able to work in environments that are populated with people.

To be able to adapt to their environment, robots need to learn the flow of traffic. For a robot to do this they can use maps of dynamics (MoD), but today there exists no comparison between current implementations.

In this thesis, two MoD’s will be implemented and compared foremost for the ILIAD project but also for everyone that needs it. Along with the implemen-tation and comparison, we will also explain what future work to be done on these two implementations.

(6)

(7)

Introduction

1.1 Background

Humans, mostly do not move around randomly, they follow certain motion patterns. For example imagine a mall where some humans are walking and some other objects are static, like plants. When people move around from one place to another, they often fallow certain movement patterns that have been walked before by others. Their direction and destination are corresponding with some kind of intention that others had before them when they traveled between the same places.

For humans, it is pretty simple to go through a mall. People have unspoken rules, like if we go through a corridor, we would probably choose to go on the right side.

Imagine walking in this corridor again, what if you almost collide with a person face to face. Sometimes it is hard to pass a person if he also decides to pass you in the same direction. This whole situation could be avoided if booth walked along the right wall which sometimes is a common motion pattern.

(12)

2 CHAPTER 1. INTRODUCTION

Imagine putting a robot in this corridor, how would it know about objects that moved, by only using a regular map. A regular map only describes static objects but the robot might need some kind of map that also explains the movements in this environment. Remember our example where two humans almost collide face to face, just imagine this situation with a big heavy robot. The robot with a static map would probably calculate the fastest way from its current point to its goal. It would also consider the human in-front as a static object and add it to the map. If the fastest way is to walk on the left side it might choose this side hindering the human which follows a motion pattern, with the hidden rules in his head, would turn right and thus a collision could occur. To avoid that, we want for the robot to follow the motion in the environment, as shown in figure 1.1.

Figure 1.1: Without knowing how people tend to move this robot on the left would equally likely plan to walk on the top side as the bottom side. The problem is that if the robot chooses the top side it walks against the flow of people. This might be solved with a map of dynamics where it makes a plan which align with the expected flow in the environment thus choosing the bottom side, as seen with the blue line

(13)

1.2. MAPS OF DYNAMICS 3

1.2 Maps of Dynamics

A regular map only contains static objects, if you would like to estimate the movement in an environment you can use a Maps of Dynamics (MoD). MoD contains information about motion patterns in an environment. To solve the problem that was described in the previous subsection, a robot can use MoD to be able to follow the flow. MoD describes how objects probably will and won’t move in an environment. In our example, with the mall, this would mean how people tend to walk either just through the mall forming a rather long linear motion pattern or from store to store, making many small motion patterns. These MoD exist in many varieties today. Some of these we describe in section 2.

1.3 Purpose

As we talked about in the last subsection, there exists many MoD. Each of those maps have different benefits. In different environments some MoD’s may fit better than others.

To the best of our knowledge, there does not exist any kind of comparison between different MoD’s for any kind of evaluation [7], therefor there need to be comparison between them to evaluate what to use for your specific environ-ment. So in this thesis two kinds of maps of dynamics will be presented along with an evaluation, one using Gaussian Processes called GP-map [2], and the second will be called GMM-map using the Gaussian Mixture Models [6].

(14)

1.4 GP-map

This map of dynamics will have a Gaussian Process for each motion pattern. The motion patterns will be found by our k-means algorithm which is explained at section 4.5.

These random functions is in our case the humans walking in the mall which can be viewed as a collection of Gaussian Process, here the covariance (a parameter for Gaussian Processes) can be roughly interpreted as the width of a one-way corridor which humans traverse for example.

1.5 GMM-map

The GMM is a way to model a multimodal data set, according to its distri-butions . In a data set that looks like in figure 1.2, it is possible to create a model of high quality.

Figure 1.2: To the left: The Gaussian distribution in a data set. To the right: A data set containing one Gaussian distribution.

But if the data set looks like in figure 1.3, and we want to make conclusions about each of the three groups of data in the data set, it gets tougher. Only one model in this data do not represent the data well and therefor the quality of the model gets poor.

(15)

1.5. GMM-MAP 5

Figure 1.3: A data set with data formed as three groups.

If we only use one model, as we did in figure 1.2, over the three groups of data, we will end up with a result looking like in figure 1.4. This will not make it easier to make a conclusion of each group of data.

Figure 1.4: To the left: A Gaussian distribution over a data set containing three groups of data.

To the right: A data set with three groups of data but only one distribution

As said before, GMM is a way to model a multimodal data set. So if we do GMM on the data set instead, we can end up with a result looking like on the right in figure 1.5. This will make it easier say something about the three groups individually.

(16)

Figure 1.5: To the left: Three Gaussian distributions in one data set. To the right: Three groups of data with one Gaussian distribution each.

1.6 Methods

For this project we used the SCRUM method, to get a more effective working flow. The SCRUM method is a type of extreme programming. Scrum is divided into subsections which are called sprints. In our case, we define these as 2-weeks periods, a totally 5 Sprints over the 10 weeks period. To manage our sprints we where using sprint boards and for that, we where used Trello. In these sprints, there is a set of tasks that are meant to be finished within a certain time. At the end of each sprint, we tried to refectories our code, to make it more readable, and lastly, we follow up to see if we made it or if we failed the current sprint. Then, before we begin a new sprint, we have a sprint- evaluation to see what we had done good and what we could have done better.

(17)

1.7. ETHICAL CONSIDERATIONS 7

1.7 Ethical Considerations

For a robot to operate in an environment with moving objects they can use a MoD, which might use data collected from environments where people are monitored. In many ways it is a good thing when robots take over tedious or dangerous tasks. But it also creates some considerations. Is it okay to collect data from environments where people are monitored? Who should be approved to monitor people. Can the data be used in bad purposes? In this thesis the training data is collected at a mall, which might seem harmless. But where is the limit of what is wrong and what is right when monitoring people? It is therefor important to develop tools that is for the benefits of people.

(18)

1.8 Outline

The rest of this thesis is organized as follows.

Chapter 2 Related Work, contains some examples of works that have been done before and that is related to this thesis. This chapter is done both by Andreas and Johan.

Chapter 3 Method Descriptions. This chapter describes which reports that we followed when we implemented the Gaussian Processes and the GMM and also how much they differ from the originals. This chapter is done both by Andreas and Johan.

Chapter 4 Implementations, explains all of the implementations to the project. - Section 4.1 Flow Chart, is done by both Andreas and Johan and

gives an overview of the project.

- Section 4.2 Input Data, is done by both Andreas and Johan and describes the training data

- Section 4.3 Filter Noise, is done mostly by Andreas and describes how to filer the noise from the input data

- Section 4.4 Linear Interpolation, is done mostly by Johan and de-scribes how to change the number of the data point in one trajectory - Section 4.5 Clustering, is done mostly by Andreas and describes

how to clustering trajectories using K-means clustering.

- Section 4.6 Gaussian Processes, is done mostly by Andreas and describes how Gaussian Processes is used in the project.

- Section 4.7 Gaussian Mixture Models, is done mostly by Johan and describes how GMM is used in the project.

- Section 4.8 Gaussian Distributions, is done mostly by Johan and explains what Gaussian distributions, variance and covariance is. Chapter 5 Evaluations, contains evaluation of the results.

(19)

1.8. OUTLINE 9

- Section 5.1 describes the results from the GP-map and the GMM-map individually.

· Subsection 5.1.3 Evaluation GP is done mostly by Andreas and

contains the results from the GP-map.

· Subsection 5.1.4 Evaluation GMM is done mostly by Johan and

contains the results from the GMM-map.

- Section 5.2 is done by both Andreas and Johan and contains the result of when we compared the GP-map with the GMM-map. Chapter 6 Conclusions, contains the conclusions of our results. Here we

dis-cuss pros and cons with the processes and also what we thought of the results and how well it fits its purpose.

- Section 6.1 Input data, is done by both Andreas and Johan. - Section 6.2 Linear interpolation, is done by mostly Johan. - Section 6.3 Clustering, is mostly done by Andreas.

- Section 6.4 Gaussian Processes, is mostly done by Andreas. - Section 6.5 Gaussian Mixture Models, is mostly done by Johan. - Section 6.6 Comparing of the GP-map and The GMM-map, is done

by both Andreas and Johan.

- Section 6.7 How GP and GMM fits on the warehouse environment, is done by both Andreas and Johan and describes how well the GP-map and GMM fits the environment that they where suppose to be used in.

Chapter 7 Future work, contains what to be done in the future on this project to make it better.

- Section 7.1 Change the number of data points to interpolate to, is done mostly by Johan.

- Section 7.2 Change the clustering method, is done mostly by An-dreas.

(20)

- Section 7.4 Changes to do with the GMM, is done mostly by Johan. Chapter 8 Appendix, contains the code for the whole project. The project is

(21)

Chapter 2

Related work

In this chapter, we present relevant theory to our problem and solution written in the first chapter which also includes theory about GP-map and GMM-map, which might be useful for a deeper understanding of GP’s. We also describe their similarities and the differences from our work.

The theories describes different kind of MoD and also different ways of creating trajectories.

2.1 Trajectories

A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance In the scientific paper, A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance, written by Brendan Tran Morris et al[7], the authors deal with activity learning, adaption, and feature selection while they try to detect abnormalities in the environment. In the activity learning, they discuss different methods for comparing trajectories with each other to be able to cluster them.

The problem is that there are some differences in their report and ours. They make sure that all of the trajectories have the same length by stretching instead

(22)

12 CHAPTER 2. RELATED WORK

of adding equally spaced points, they call this normalization. This, in contrast to us, who uses the linear interpolation method to compare trajectories. In the normalization method they use the dynamics at the last tracking time to estimate extra trajectory points, as if it had been tracked until this time. Here they also mention that a modified Hausdorff distance can be a good fit when trying to clustering the data, while we use the Euclidean distance [7]. In their aim to find abnormalities in the environment, they try to find points of interest (POI) and activity paths (AP), while we are trying to find motion patterns in a shopping center. POI is imaged over regions where interesting events occur and AP describes how objects move between POI’s[7], while our trajectories keep track of all movement in the environment.

In their project, they divide their trajectories into sub trajectories and also compute the probability for each sub trajectories, while we keep the trajectories as they are. They also, refer to internal states, centroid, and envelope, where we refer to data points, mean value respective covariance[7].

This is a very useful source because it also talks about Gaussian Mixture Models implemented with the EM-algorithm.

2.2 Maps of dynamics

2.2.1 CliFF-map

A public Doctoral Dissertation written by Tomasz Piotr Kucner [5], which im-plements this (CliFF-map). Many MoDs is already implemented as mentioned in chapter 1 one of them is the Circular-Linear Flow Field map (CLiFF-map). His focus is on building maps that capture the motion patterns of dynamic objects and the flow of continuous media, which is the same as ours. This

(23)

2.3. MODELLING SMOOTH PATHS USING GAUSSIAN PROCESSES 13

reference is useful to build a foundation for our knowledge because his project is very similar, it also explains what maps of dynamics are in a simple and understandable way.

2.3 Modelling smooth paths using Gaussian Processes

This is the paper of which we follow to implement the Gaussian Processes and is written by M.K. Tay Christopher and C. Laugier. In this paper we mainly follow the section about representing motion patterns using Gaussian Processes. Christopher et al avoid common terminology for Gaussian Processes but still explain the basic concept of which we try to follow and improve in section 4.6. [2]

2.4 Gaussian Mixture Models

Learning Motion Patterns of People for Compliant Robot Motion, written by Maren Bennewitz et al [6]. The report discusses how to do a Gaussian Mixture model on a data set of trajectories. It is divided into three parts, the learning of Motion Patterns, Deriving Hidden Markov Models from Learned Motion Patterns and Person Detection and Identification.

In the first part the authors discuss the implementation of GMM and how similar it is to the k-means clustering due to the fact that the trajectories only will belong to one model. They even discusses how to estimate the "right amount" of clusters for a data set of, by using the BIC method, which we did not have time to do.

In the second part they use Deriving Hidden Markov Models to predict the motion of people between resting places.

In the third part they discuss how they detect and identify each person in the test environment.

This is the paper that we followed in the implementation of GMM. You can read more of that in section 3.2

(24)

14 CHAPTER 2. RELATED WORK

2.5 Mixtures of Regression Models

Trajectory clustering with mixtures of Regression Models, written by Scott Gaffney et al[10], use Mixtures of Regression Models on collected data, that represents arm movements, while our data comes from people walking in a shopping center. They investigate the problems that occur when trying to clustering trajectories, by using the same clustering methods that we use, K-means clustering and Gaussian Mixture Models, among other clustering methods. Their work is performed on generated data, as well as ours, but also on video streaming unlike ours that is performed on authentic data from people walking around in a shopping center. Their conclusion is that Mixtures of Regression Models outperform K-means clustering and Gaussian Mixture Models because that it can compare multivariate trajectories in a better way on their data set.

(25)

Chapter 3

Methods descriptions

In the previous chapter, we talked about related work to provide a context for our work and introduce key concepts. In this chapter, we look a lite closer of two papers that we followed while implementing GP-map and GMM-map. We followed the "Modelling Smooth Paths Using Gaussian Processes" paper [2] for our GP-map implementation. For implementation of the GMM-map we followed the paper "Learning Motion Patterns of People for Compliant Robot Motion"[6]. To be able to make our thesis, we could not follow the thesis literally because of different prerequisites. In this chapter, we will discusses our methods, how they differ from the original methods and evaluate the two.

3.1 Gaussian Processes methods chosen from our main

reference

In the implementation of GP, we mainly followed the Modelling Smooth Paths Using Gaussian Processes paper. The methods they use which we implement are:

• k-means clustering, this is something mentioned but not written in detail.

(26)

16 CHAPTER 3. METHODS DESCRIPTIONS

• Defining the covariance (kernel) [2]

Christopher Tay et al does explain the covariance but they do not use terms common to Gaussian Processes like kernel. They are also unclear of what is the training data and what is the training data targets, which is important when optimizing the kernel more about this in section 4.6. [2]

Using terms of which is common within the area allows for easier research for specific components. To follow the implementation or alter it for some kind of evaluation this kind of information would be easy to add as a writer and really important as a reader.

They define 2 Gaussian Processes one for the x coordinate within a cluster and one for the y coordinate, they also assume the x and y coordinates to be independent [2].

3.2 Gaussian Mixture Models

In the implementation of Gaussian Mixture Models, we followed the Learning Motion Patterns of People for Compliant Robot Motion paper. In their project they implement:

• Learning Motion Patterns

• Deriving Hidden Markov Models from Learned Motion Patterns • Person Detection and Identification

[6]

The goal with this thesis was to compare the Gaussian Processes map and the Gaussian Mixture Models on the training data, as described in section 1.3.

(27)

3.2. GAUSSIAN MIXTURE MODELS 17

So, therefore, we only followed the Learning Motion Patterns section when implementing the GMM-map.

The following subsections describe the difference, on a deeper level, between our implementation of Gaussian Mixture Models and theirs.

3.2.1 Linear Interpolation

When we did the linear interpolation, we walked along the old trajectory and added the new points, as described in section 4.4. Bennewitz et al, do the Linear Interpolation, but they do not explain how they do it [6]. So it is possible that ours is different from theirs. We do not know how they interpolated their trajectories. The way we implemented the linear interpolation is described in 4.4.

3.2.2 The Initialisation of the first Models

Bennewitz et al never explained how they set the initial models, so therefor we initialized the models by setting them to some random trajectories from our data set, that had the largest distance between each other. The distance is calculated using the same distance equation as in our k-means algorithm 4.5. This was made in the beginning and made sure that the models did not get to close to each other in the initial phase.

(28)

(29)

Chapter 4

Implementations

This chapter describes how we implemented the Gaussian Processes and the Gaussian Mixture models that we talked about in the last chapter. To be able to do the GP-map and GMM-map as we talk about in section 4.6 and 4.7 we have to explain a little bit of what Gaussian distribution, variance and covariance is. This can be read about in section 4.8.

(30)

20 CHAPTER 4. IMPLEMENTATIONS

4.1 Flow chart

To be able get an overview of the data flow through the project, a flow chart is presented in figure 4.1. This figure will also guide you through the whole chapter.

Figure 4.1: Flow chart, here is the workflow described where every block is a part in our program except the comparison which is done manually as described in section 6. It shows how the process of making Gp-map and GMM-map before we finally could compare them.

• Format data - The first step is to read the input data. Our test data is recorded as tracks of people by sensors mounted in the ATC shopping center. The input data in further described in section 4.2 This step also includes filtering out noise and is described in sections 4.2 and 4.3. • Interpolate the trajectories - to make it possible to compare trajectories

with each other, we first need to interpolate them so that it contains the same amount of data points, one solution is to interpolate them using linear interpolation. This is described more in section 4.4.

• K-means clustering - To be able to say something about the data we need to divide it into groups. This is more deeply described in section 4.5.

• Calculate parameters for GP - For each cluster calculate its mean trajec-tory and its co-variances and also calculate GP. This is described more in depth in section 4.6.

(31)

4.2. INPUT DATA 21

• Calculate parameters for GMM - For each motion pattern calculate the mean position of a set of Gaussian.. This is more deeply described in section 4.7.

• Compare- Finally we compare the two maps with each other. That is done in section 5.2.

4.2 Input Data

Figure 4.2: The green marked square is the current state in the progress. The data that we work with was collected in the ATC mall in Osaka, Japan. An overview of the ATC is shown in figure 4.3.

The authors, D. Brscic et al, tracked an environment of 900m2 _{over 92 days}

(2012-October-24 and 2013-November-29), using a system that consisted of 49 3D range sensors for pedestrian tracking. The data was collected on Wednes-days and SunWednes-days and contains approximately 16950 trajectories that people have walked in the mall[3].

In figure 4.4 we can see what the data looked like when we plotted it. Where each trajectory has a specific colour. We can see that it matches the shape from figure 4.3.

(32)

Figure 4.3: The ATC in Osaka, Japan, where the training data was collected. [3]

Figure 4.4: The plotting of the training data, where each trajectory has differ-ent colours.

The collected data is saved in CSV files where each row contains time, per-son id, position x, position y, position z, velocity, angle of motion and facing angle. The time is measured in milliseconds, the position in millimeters, the velocity in millimeters per second and the angle and facing angle in radians. The number of data points in each trajectory depends on the number of rows in succession that contains the same id. We only used the first 4 attributes, in each row, to do the Gaussian Processes and the Gaussian Mixture Models.

(33)

4.3. NOISE FILTERING 23

4.3 Noise filtering

Figure 4.5: The green marked square is the current state in the progress. In our test data which is explained in subsection 4.2 there is some noise which is for example when people stand still in a store. This can be seen in cyan color at the far left in figure 4.6. This noise will cause problems later in our program that is why we have to filter it out.

We filter noise as seen in figure 4.6 where cyan color is what gets recognized as noise. This is done with algorithm 1. Do note that this is a pretty rough filtration but we are working with mm here so we do not lose as much data that may be implied by the visuals.

Figure 4.6: Filter noise the cyan lines is removed returning only the magenta line

(34)

Explain the algorithm!! Algorithm 1 Remove noise

Require: trajectory to filter, threshold

Ensure: trajectory without points to close to each other depending on threshold 1: function remove_noise(trajectory, threshold)

2: for point, next_point in trajectory do

3: if norm(next_point-p1) < threshold and not lastpoint then 4: Remove next_point

5: elsif(lastpoint)

6: Remove point #To keep the last data-point static 7: end if

8: end for 9: end function

4.4 Linear interpolation

Figure 4.7: The green marked square is the current state in the progress.

To be able to compare the trajectories and to do clustering and further work on the input data, the trajectories must contain the same amount of data points. Therefore we interpolated the trajectories. For this, you can use linear interpolation. The figure 4.8 shows two different trajectories with different amount of data points that has been interpolated so it contains 11 data points.

(35)

4.4. LINEAR INTERPOLATION 25

Figure 4.8: Linear interpolation of two trajectories so that they contain 11 data points each

The original data points for the left and right trajectories in figure 9 and 68 which is marked with grey points. These points are then connected with a blue line that represents the whole trajectory. The black points and the line between them is the new trajectory after the linear interpolation.

The first thing to do is to decide how many data points each trajectory should contain. In our case, we simply took the average number of points from the input data. Depending on how long a trajectory is, the segment lengths will be different. In Figure 4.9 we see how to compute how long a trajectory is..

Figure 4.9: The right figure shows the total length of the trajectory (L) and the left figure shows how long each segment length (l1=12 =· · · = lT +1) =

(_{T +1}L )

in the same trajectory should be. Here T-1 is the number of segments that we would like to split the trajectory in.

(36)

The total length of the trajectory L is the sum of the distances lt between

consecutive data points, as shown in equation (4.1), where T is the number of data points that we want to receive.

L =

t=T −1_X t=1

li (4.1)

A little more narrowed described, L is the euclidean distance between each data points pt, where

pt = (xt, yt) is the Cartesian positions and T the number of data points in

that trajectory, as described in equation. (4.2). L =

t=T −1X t=1

p

(xt+1− xt)2+ (yt+1− yt)2 (4.2)

For each trajectory, we then compute the distance traveled, di, from the

be-ginning of the trajectory, p0, to data point pt, before we save the distances

into a list of distances. These distances is later used when we place the new data points in the interpolated trajectory.

Figure 4.10: The distances travelled from the start of the trajectory to data point pt

The distances is then computed, as the equation 4.3 shows, where dtis a data

(37)

4.4. LINEAR INTERPOLATION 27 dt=    0 if t 6 1 Pt=T t=2lt+ lt−1 else (4.3) Now we have the length of the whole trajectory and the length from the first data point to each of the consecutive data points in the original trajectory. The new trajectories should contain one more data point than the number of segments. To add new data points we just "walk" along the old trajectory and replaces all of the old data points with the new data points, except for the first and last data point, which is kept as the are.

To be able to place the new data points along the old trajectory we need to know the directions between each data point. This is shown in figure 4.11 and equation (4.4). In the equation, we also show how to normalize the vector δi+1,

where Pi is a data point on the trajectory and Pi+1 is the consecutive data

point on the same trajectory. The algorithm for this is shown in algorithm 2.

Figure 4.11: Calculations of the direction from Pi to Pi+1, using the vectors

Pi and Pi+1

δi+1=

pi+1− pi

||pi+1− pi||

(38)

The pseudo code for the linear interpolation can be seen in algorithm 2. In the beginning, we have a trajectory that contains n data points and we want all of the trajectories to contain T data points. We start by computing the total walked distance, as described in figure 4.9 and equation (4.2). After that, we compute the distance from the first data point to all of the other data points in the trajectory, as described in figure 4.10 and equation (4.3), and add each distance to a list called distances. In line 3 we then loop through the list of distances while adding new data points and increasing the walked distance. We then add new points while our traveled distance (i ∗ (traveled_distance)) is less or equal to the first of the distances that we collected at line 1. If the traveled distance is bigger than the first of the distances we switch, and compare it to the next distance in our list. To be able to do that we need to know in which direction to move, which is described in figure 4.11 and equation (4.4). This procedure is then repeated until we have traveled further than the last distance in the list of distances. Note here that we never replace the first and the last data point.

Algorithm 2 Interpolate a trajectory

Require: A trajectory with n time steps. Ensure: A trajectory with T time steps. 1: distances = get distancestrajectry, m 2: i = 0

3: for j in range(0,distances-1) do

4: while i*traveled_distance <= distances[j] do 5: Add new time step.

6: i++

7: end while 8: end for

(39)

4.5. K-MEANS CLUSTERING 29

4.5 K-means clustering

4.5.1 Initial state

The first thing to do in most algorithms is to initialize it. In the K-means algorithm, the initialization means setting the initial centroids. This can be done in multiple ways the two most common ways are to generate k centroids completely at random the other way is to take k copies of the input data, we choose the latter. We choose this way of initializing the model because generating completely random trajectories would be time-consuming.

One problem with our initialization is that if two copies of the input data are parallel to each other the end result could be two motion patterns when it should be one instead. Our solution to this is a parameter threshold which checks that two initial centroids are not to close to each other, the distance is calculated as explained in the next subsection 4.5.2 This might be easier to understand in our figure below 4.13

4.5.2 Distance between trajectories

K-means clustering uses a distance function to determine what data is "close". Usually k-means clustering is done on simple 2D points thus the distance is usually the euclidean distance between them. We are working with trajectories

(40)

(a) too low threshold (b) high enough threshold

Figure 4.13: in the first two figures, you see the initialization where the star marked trajectories are the centroids and the grey trajectories is the input data. The final results for the clustering are seen below these two figures, where each cluster has a different col or

which is n dimensions higher, where n is the amount of 2D points in the trajectory. Note that n at this stage is the same for all trajectories because of the interpolation explained in section 4.4. Because of that, we are working with trajectories we had to implement our own distance function.

d =

i=nX i=1

kqi− pik (4.5)

If we consider each point from one of two trajectories as piand the second of

these trajectories points as qi. Note that this requires that both trajectories

(41)

4.5.3 Means of clusters

The second part of k-means clustering are how you recalculate the centroids given the data chosen as "closest" to the centroid. One could also create a mean point instead for all trajectories but that would make our chosen distance function unnecessary in next iterations of the k-means algorithm. Because of this we once again had to create our own function.

We calculated the means of N trajectories as follows. For each point in each trajectory, we calculated a "mean point" (pm) with the total amount of

tra-jectories within the cluster as N

pm= 1

N

i=NX i=0

pi

Since all the trajectories now have an equal amount of data points because of the interpolation mentioned earlier in section 4.4, each "mean-point" forms a new trajectory which is the mean-function.

Figure 4.14: the red trajectory is visual representation of the mean function calculated from the two black trajectories.

(42)

4.5.4 Algorithm

With the distance and means of clusters defined and the basic concept of k-means algorithm revisited [8], the algorithm was implemented as follows. Algorithm 3 K-means clustering for trajectories

Require: K and a list of trajectories. Ensure: k amount of clusters.

1: function kmeansclustering(listoftrajs, k, threshold) 2: Generate centroid trajectories

3: for range(k) do copy k random unique trajectories with the smallest distance thresh-old

4: end for

5: while centroids changes do

6: for trajectory in listoftrajs do assign trajectory to closest centroid 7: end for

8: for centroid in centroids do Reassign centroids with means of its assigned trajec-tories

9: end for 10: end while 11: end function

4.6 Gaussian Processes

We will not delve deep into what a GP is but here is a brief explanation to give some context. A Gaussian Process is roughly explained as a generalization

(43)

4.6. GAUSSIAN PROCESSES 33

from a probability distribution to functions. Where a probability distribution describes a finite-dimensional random variable [11].

In our case the size of the collection will be defined by our interpolation in section 4.4. For each motion pattern found by our kmeans algorithm 3 we will generatea a GP.

A Gaussian Process is defined by two functions the first being the covariance function and the second being the mean function [2]. Naturally the first step became how to define this kernel as the covariance is commonly called for Gaussian processes. This kernel decides how to resulting GP will look like and how it will behave when fitting it to our input data.

Below you can see two versions of GP kernels, Equation (4.6) is what we got from our main reference [2] which slightly differs from the equation (4.7) which scikit-learn currently uses [9]. The first term is the same if we consider d in 4.7 as a function that calculates distance, This kernel is called the "RBF" (aka squared-exponential kernel it stands for radial basis function). The second term however, is something we had to add by using an inbuilt function to add kernels so for the second term of 4.6 we used a kernel called "WhiteKernel" also from scikit-learn [9]. Lastly, we added the extra hyper-parameter by simply multiplying an extra theta for the RBF.

k(xi, xj) = θ20exp −xi− xj θ2 1 + θ22· δ(xi− xj) (4.6)

The covariance function given from our main reference [2]. k(xi, xj) =exp −1 2· d xi length_scale, xj length_scale 2! (4.7) The RBF kernel from scicit-learn [9].

k(xi, xj) = θ·RBF(length_scale) + WhiteKernel(noise_level) + δ (4.8)

This is the resulting kernel which is currently implemented. θ, length_scale, noise_level and δ is hyperparameters optimised using the max log likelihood [9].

(44)

After implementing this kernel we got strange results 4.16. With strange I mean that every sample of the GP will end up at the same blue line you see in the figure 4.16.

The problem lies within the optimisation of the GP, The "fit" function takes two parameters the input data which is our timestamps and target values which is our x values or y values [9]. The optimisation is then done by maximising the log-marginal-likelihood [9], which is something we will not go into detail but we will explain what the result of this is.

The problem is that the optimisation causes parallel observed paths as simply noise thus the resulting Gaussian Process will avoid large changes in x or y which is something we like to avoid because the covariance becomes really narrow. If the covariance is so narrow samples from the GP will also be narrow thus you will only see one sample in the figure below 4.16 because the rest is underneath it.

Figure 4.16: many sample from a Gaussian Process where the black trajectories is input x positions as a function of time and the blue lines is samples from the GP

The second problem was how to define the input data and the target data for the "fit" function mentioned earlier [9]. This is something our main ref-erence didn’t explain thoroughly, however it says that they did two Gaussian Processes one for the x-axis and one for the y-axis [2].

(45)

4.6. GAUSSIAN PROCESSES 35

To redefine our current data which at this point was clustered trajectories which contains data-points with an x-coordinate, y-coordinate and a times-tamp respectively see section 4.2.

We had to change our data-points to one dimension this was done by using the timestamps to describe x-coordinates and y-coordinate separately as a function of time.

Before this however timestamp had to be normalized so all trajectories start at 0. This is necessary because if two trajectories is the same in x-y coordinates but after each other in time it still belongs to the same motion pattern. This is done by subtracting all timestamps by the first value for each trajectory.

(46)

(a) 10 trajectories represented with different colors in x and y coordinates

(b) same trajectories as in figure 4.17a but this time the vertical axis represent x values and the horizontal represent times-tamp

(c) same trajectories as in figure 4.17a but this time the vertical axis represent y val-ues and the horizontal represent times-tamp

(47)

With the kernel and the input data correctly defined we could at this point use the fit function from scikit-learn to train our model’s hyperparameters [9]. This function takes two parameters X which is training data and y which is the target values in our case X becomes our data-points timestamp’s and y becomes our x-coordinates (and y-coordinates for the second Gaussian Process) within a cluster.

To evaluate the result of this we use scikit-learn’s function predict which takes one input which is query points and outputs the mean of the GP along with the standard deviation [9]. We use an evenly spaced time frame between max and min value of the input timestamps to get the query points. This is to get an overview of the whole Gaussian Process, one for the x-process and one for the y-process.

4.7 Gaussian Mixture models

In the implementation of the GMM-map, we did not use the clustering method, as we did in the making of the GP-map. So, after the interpolation of all of the trajectories, we performed the GMM on the trajectories. To be able to perform the GMM, we used the The EM-algorithm. The EM-algorithm is a common used algorithm, were you iterate over 2 steps until you reach a stop condition.

The first step is the E-step, where you compute the Expected value. The second step is the M-step, where you compute the maximization.

(48)

In our case the first step is for each trajectory to compute the probability that it belongs to each of the models. The second step is then to update the models according to the new probabilities from the trajectories. With that said, each model is depending on all of the trajectories according to the probabilities that was computed in the E-step. Therefor we needed to set a constant covariance, otherwise the covariance would get too large.

4.7.1 Gaussian Mixture model with EM

When we implemented the Gaussian Mixture models, we followed Learning motion patterns of people for compliant robots, written by Marren Bennewitz et al [6]. But instead of setting the covariance to 1700 mm, we set the constant covariance to 1930 mm. This was made after several evaluations on different values, where 1930 mm seemed to be the best fit for the environment in the training data. Why we did not set it to 1700 mm was because that their project was performed in an office, with small spaces, while ours where performed in a mall, with larger spaces.

In Gaussian Mixture Models that contains only data points, and not trajecto-ries, we can get probabilities between 0 and 1, but that will not be the case here. This is because of the fact that we compare each data point in the model with a trajectory and multiply the probabilities for all of the data points together. This leads to really small probabilities, and in turn, after normalization, to binary probabilities.

Lastly, to be able to make conclusions about our data set, we assume that the training data can be approximated with Gaussian distributions.

(49)

The Initial State

A simplified version of how we initialized the models can be seen in figure 4.19, where we assume that the data set contains two motion patterns and four trajectories. In the initial phase, we generated centroids by using some random trajectories from our data set, that was not too close to each other, according to a threshold that we set. These motion patterns are marked as the red points in the figure with the covariance marked with a black dotted line. In the initial phase the mean value µi for a motion pattern i, is set to its

original trajectory. The t2 and t3are the other two trajectories from the data

set.

Figure 4.19: Four trajectories location according to two distributions, with mean values µi,where the µi’s is set to two of the trajectories.

One motion pattern, from the total number of motion patterns M, we denote as θm and every trajectory ti consists of a sequence ti= {t1i, t

2 i, ..., t

n i}. The

linear interpolation earlier, in subsection 4.4, gave us the that all trajectories contained T data points, so that each trajectory di= {x1i, x2i, ..., xTi}. Lastly

we also set the total number of trajectories to I.

Given that a person is in motion pattern m the probability for the trajectory to be at location x is after [(k − 1) · β + 1; k · β] steps is P(x|θk

m), where K

is a constant that decides how many data points we shall compare with and β = [T|K]. Thus, we can compute the likelihood for the trajectory di of being

(50)

At the initial faze we set the probability for θ[t/β]

m to 0.0 to all of the

trajec-tories. P(di|θm) = T Y t=1 P(xti|θt/βm ) [6] (4.9)

After this we implemented the Gaussian Mixture Model (GMM). The E-step in GMM

To be able to compute the probability that a trajectory belongs to a motion pattern, we used equation (4.10), where di is a trajectory, θ[j] is a motion

pattern, σx,ythe covariance and µ[t|β][j]the mean value for that specific motion

pattern. The [t|β] just defines how many data points we shall compare the

trajectory di with, given a constant K. The β value is the total amount of

data points T, in one trajectory, divided by the constant K and σx,y is the

covariance. [6]. E(di|θ[j]) = T Y t=1 exp − 1 2σ2x,y||x t i−µ[t|β][j]||2 [6] (4.10)

In the equation (4.10) we referred to the motion pattern as θ but let us refer to it as Ciinstead. Then we can calculate the probability that a trajectory belongs

a motion pattern as seen in figure 4.20. When we have done the calculations for a trajectory ,as referred to in equation (4.10), we get really small probabilities of that a trajectory belongs to a motion pattern. This is because of the fact that the likelihood for each data point is multiplied together with each other. For the models to change according to the data set, we have to divide the probability that a trajectory is in one model divided by the sum of the trajectory being in all of the model. With this said P(ti)(The red columns) should be equal to

(51)

Figure 4.20: The probability table shows the probability for trajectory ti, given

a model Cm. The total probability for a trajectory being in any model in the

sample space is eaqual to 1.0.

To make it easier to explain what we did, we introduce a set of "correspon-dence variables" cim, where i is the index of the trajectory diand m the index

of the motion pattern θm. cim is a probability value, that tells us how much

the trajectory belongs to the motion pattern.

In other words, a high value (close to 1) on cim makes di:s value to have a

big impact on the mean value for the motion pattern θm, otherwise if diis far

from θm, cim becomes very small.

Lastly, before we can set any new probability values for our trajectories, we need to make sure that the total probability sum up to one. Therefore we introduce a normalizing constant n0_{. n}0 ₌ P(θ[j]_|d

i)

Pj=K

j=1P(θ[j]|di). In other words the probability that a trajectory belongs to each motion pattern should sum up to one. This is also called the Bayesian theorem. The equation for the whole E-step in our GMM is shown in equation (4.11).

E(cim|θ[j], d) = n0 T Y t=1 exp − 1 2σ2||x t i−µ[t|β][j]||2 [6] (4.11) The M-step in GMM

The next step is to compute the new mean value for all the motion patterns, given the new probabilities that was computed in the previous E-step. The

new mean value µj+1

m for a model m and probability distribution P(x|θ

k[j+1] m )

(52)

is computed in equation (4.12), where we use E(cim|θ[j], d) from the E-step [6].

µk[j+1]_m = 1 β· k·β X t=(k−1)·β+1 PI i=1E[cim|θ[j], d]xti PI i=1E[cim|θ[j], d] [6] (4.12)

Lastly we just iterate through the E- and M-step until it converges.

4.8 Gaussian distribution

So back to the example with the mall; when people move from one place (place A) to another (place B) in a mall they tend to follow similar paths with some small deviations from the path that they take. If we collect many observations of the paths that people have walked from A to B, we can compute a mean path for this route and approximate this collected data to be Gaussian distributed or Normal distributed. The distribution is shown in Figure 4.21, where the mean path is marked with 0.0

Figure 4.21: An example of a Gaussian distribution/Normal distribution

To be able to say anything about the collected data, that is Gaussian dis-tributed, there needs to be a correlation in it. To find out if there is any

(53)

4.8. GAUSSIAN DISTRIBUTION 43

correlation and to be able to say how much how far a set of paths are spread out from their average path you can use the covariance and the variance. To simplify the explanation of variance and the covariance we explain it by using points instead of paths. These equations are shown below in equation (4.13), (4.14) and (4.15). The n-variable is the total number of points observed, the xi and the yi is the x- respective the y-value for a point and the µ is the

mean of all the x- or the y-values.

The variance tells us how much the spread is in the data set and the covariance, if there is any correlation between the x’s and the y’s in it, [4, 1].

The variance of x: var(x) = cov(x, x) = 1 n −1 n X i=1 (xi− µx)(xi− µx) [1] (4.13) The variance of y: var(y) = cov(y, y) = 1 n −1 n X i=1 (yi− µy)(yi− µy) [1] (4.14)

The co-variance of x and y:

cov(x, y) = 1 n −1 n X i=1 (xi− µx)(yi− µy) [1] (4.15)

(54)

(55)

Chapter 5

Evaluation

In this section we presents the result of the implementation of Gaussian Processes-and Gaussian Mixture models map, that we implemented in section 4.6 Processes-and 4.7, and lastly we also compare the them with each other.

5.1 Result

The tests were performed on the test data, as described in section 4.2, and also on generated data. The latter was created on our own because of the fact that it was hard get nice results on the test data.

5.1.1 The generated data

In figure 5.1 we see the generated data, 300 trajectories that is located in 3 groups. The first 100 trajectories is located in the bottom left group, the next 100 is located in the middle group and the last 100 trajectories is located in the upper right group. Thus optimal clustering would yield 3 clusters (one for each group).

(56)

46 CHAPTER 5. EVALUATION

Figure 5.1: Generated data of 300 trajectories, that is divided into 3 groups.

The values on the x- and the y-axis is just unit distances.

5.1.2 The test data

In figure 5.2 we see all of the trajectories that was collected in the ATC shop-ping center in Osaka, Japan.

Figure 5.2: The plotting of the training data, where each trajectory has differ-ent colours.

5.1.3 Gaussian Processes

From the figure 5.4, we can see a Gaussian Process for each cluster. The coo-variance is really narrow because when we fit the Gaussian Process to the

(57)

5.1. RESULT 47

observations in figure 5.4. This is because when the Gaussian Process is op-timized it interprets parallel trajectories as simple noise. This is more easy to discern when looking at the hyper parameters mentioned in chapter 3, the noise parameter in our covariance function will almost always hit the roof of its defined boundary. In the figure below 5.3 the noise component of the kernel always get optimised to the boundary max-value.

(a) noise boundary 1 (b) noise boundary 1000 (c) noise boundary 10000 Figure 5.3: The y process taken from one of the clusters from our k-means algorithm 3, with three different noise boundary’s. Magenta points for GP mean and light blue for the covariance

The mean functions however have a pretty good estimation as seen in figure5.3 and 5.4. This result is also pretty similar to the figure generated by the GMM-map below 5.1 which also showcases a mean estimate.

Figure 5.4: fake data which is generated by us the magenta colored lines is the mean values for the Gaussian Processes. the black ones is the data/observations generated. The covariance is supposed to be represented as light-blue

(58)

5.1.4 Gaussian Mixture Models

GMM on the generated data

In figure 5.1, we see three clusters, each containing 100 trajectories. When we initialized three models, one red, one green and one blue, as can be seen on the left in figure 5.5. The light blue shadows is the trajectories from figure 5.1. On the right we also see that the probabilities for each trajectory is set to 0.0 for all the three models in the initial faze.

Figure 5.5: To the left shows the initial models, illustrated as the red, green and blue trajectories, and the trajectories from the test data, illustrated as a light blue shadow.

To the right shows the probability (y-axis) for each trajectory (x-axis) of it being in any of the models (red, green and blue). They are all set to 0.0 at the beginning

(59)

5.1. RESULT 49

Results for different values of the covariance

Bellow in figure 5.6, we can see the results of what happens if we change the value of the covariance.

With a good covariance, in this case set to 25, we get a good result, where all of each model are in a respective group, as can be seen in figure 5.6a. In figure 5.6b, we also see that each trajectory got a probability of 1.0 to only one of the models.

With a too low covariance we get a bad result, where all the models get the same likelihood. This can not quite be seen in figure 5.6c, due to the fact the red and the green points are plotted before the blue ones. Due to the same fact we can only see the probability for the blue model, but all of the models probability is 33.333... % which can be seen in figure 5.6d.

With a too great covariance we also get a bad result, where the red and the green models are being affected by 200 trajectories at the same time, as can be seen in figure 5.6e. In figure 5.6f we also see this, where the first 100 trajectories is affecting approximately 60 % of the red model and 40 % of the green model. The next 100 trajectories is almost the inverse of the first 100, except that it also affects the blue model a little. While the last 100 trajectories only affects the blue model.

(60)

(a) The results from GMM with a good value of the covariance.

(b) The probabilities for each trajectory belonging to each model.

(c) The results from GMM with a too low value of the covariance.

(d) The probabilities for each trajectory belonging to each model.

(e) The results from GMM with a too high value of the covariance.

(f) The probabilities for each trajectory belonging to each model.

(61)

5.1. RESULT 51

GMM on the test data

In figure 5.7 we see two results of when the covariance is set to 1930 mm. Here used 11 models when we loaded 80 trajectories from the data set. This figure tells a lot of the problems that occurred while trying to implement the GMM on the test data. Due to the fact that we never got the same initialization after each time we ran the program, it made it hard to modify the covariance so that it fitted the data better. After a numerous of tries we think that a covariance of 1930 mm was the best fit for just this data. In the figure we can see that the left and the right figures do not match each other so well.

(a) Result 1, with a covariance set to 1930 mm.

(b) Result 2, with a covariance set to 1930 mm.

Figure 5.7: The light blue is the shadow of 80 trajectories and the figures with different colours is final models (11 different) after GMM was performed on the data.

With a covariance on 1800 mm, as can be seen in figure 5.8, we get a strange result. It remind us of the result we got from when we made the tests on our generated data with a too low covariance, where the data ended up as somewhere in between all of the trajectories. The only exception here is that on the left we got two models that has gone astray and on the right we got one that has gone astray. This may be due to the fact that even though that the covariance is low, there are still trajectories that can pull in some model.

(62)

Figure 5.8: The light blue is the shadow of 80 trajectories and the figures with different colours (11 different) is final models after GMM was performed on the data.

With a covariance of 2000 mm we also get a strange result, where some model seems to grow to abnormal size. This, we do know not why it happened. We assume that it might be due to bad initialization of the models or the fact that the covariance is too big for some areas so that the trajectories in that area effect too many models at the same time.

Figure 5.9: The light blue is the shadow of 80 trajectories and the figures with different colours (11 different) is final models after GMM was performed on the data.

(63)

5.2. COMPARISON 53

5.2 Comparison

In this section we made a comparison between the GMM-map and the GP-map. It is not simple to compare them with each other. As we saw in the previous section, some models in the GMM stray outside the area of the shopping center. Therefor we plot the colour of the model that each trajectory belonged when we finished the GMM.

In figure 5.10a and 5.10b we see the initial states for the GMM-map and the GP-map respectively. The light blue background is all of the trajectories and the coloured stars is the initial models/clusters, where one colour of starts rep-resents one model/cluster. Here we use 80 trajectories and 4 models/clusters, just to simplify the test. As can be seen the starting condition is the same for both the GMM-map and the GP-map. In figure 5.10c and 5.10d we see the result for the GMM-map and the GP-map respective. As can be seen there is very hard to make any conclusions about the results. The models in GMM is, as said before, stray from the area of the shopping center so the result is not good, while the mean of the GP’s is pretty accurate given that some trajectories is seen as simply noise as explained in section 4.6.

(64)

(a) Initial condition GMM (b) Initial condition GP

(c) Final result GMM (d) Final result GP

Figure 5.10: The comparing of the GMM-map and the GP-map. In the pictures on the top, we see the initialization of the probabilistic models. On the bottom, we see the final result from the probabilistic models.

(65)

Chapter 6

Conclusions

In this chapter we will discuss the results of our implementations from chapter 4 and the results from chapter 5.

6.1 The input data

Can we approximate the data to be normally distributed?

We do not really know if we can approximate the data to be normally dis-tributed, due to the fact that the data only is collected on Wednesdays and on Sundays. For example, if stocks is replenished on Wednesday’s, people could be deranged from the usual path to avoid package’s or other obstacles. Probably, they, who collected the data, had that in mind and picked those days of that or some other reason, but they never give that information.

Another problem is to deal with noise in the trajectories, when people for some reason just stop their movement. This is shown in the data set as many subsequent data points close to each other. It can distort the result due to the fact that we do not know how much to filter out and if we got many of these it will affect the motion patterns.

(66)

56 CHAPTER 6. CONCLUSIONS

If we had known from the beginning that the test data should be hard to test on, we had generated our own data set much sooner. This would have saved a lot of time and given us more time to do more tests.

6.2 Linear Interpolation

Interpolate with Too Few data points

If we interpolate all trajectories with too few data points we will lose too much information and the motion patterns will end up being edgy. Figure 6.1 is an extreme example of this, where the blue line is the real route that person P2 took, the blue squares the collected data from P2:s route and the black squares the interpolated route.

Figure 6.1

This could even lead to trajectories that do not even could exist in reality. Interpolate with too Many data points

Otherwise, if we interpolate them with too many they will be over fitted and maybe not accurate to the reality ether.

(67)

For example, if a person moved really fast between many places we could end up by cutting off his/her trajectory at the wrong place. This would create motion patterns that do not exist or even worse could not exist. Figure 6.2 is an extreme example of this, where the red line is the real route that person P1 took, the red squares the collected data from P1:s route and the black squares the interpolated route.

Figure 6.2

6.3 K-means clustering

This seems to work pretty well with our input. It is hard to evaluate visually by yourself what the clusters should be. Because of this it is somewhat hard to say if it is good or not but given the clusters it outputs it seems to work well.

One minor problem is to decide how close initial centroids can be this is almost completely solved by our threshold parameter however as seen in subsection 4.5.2

The biggest problem is deciding k which we have not delved deep into. This is another factor that makes it hard to evaluate the clustering.

(68)

6.4 Gaussian Processes

The GP’s gave a good mean function but the covariance is very narrow if this is a major problem or not is something we have not researched completely. To fix or tweak this more work on the kernel and tests on more input data would be necessary.

6.5 Gaussian Mixture Models

It was hard to do the Gaussian Mixture models on our data set that we received from D. Brscic et al [3], due to the fact that:

• It needs a good initialization of the models,

• The covariance needs to be set to a good, constant value. If these do not get right, we could end up with really strange results.

6.5.1 The generated data

In an environment with a constant space size, like the one in our generated data, Gaussian mixture models seemed to work fine. The importance here, thou, is to find a good value of the covariance, that fits the environment. This worked fine on our data, but maybe that is because of the fact that the trajec-tories in each group of trajectrajec-tories had too small spread. With a bigger spread in each group we could have got a different result. So that needs to be tested more.

(69)

6.5.2 The training data

The Gaussian mixture models was not a good method to use on our training data. That is probably because that the shopping center had shifting space sizes, which made it hard to find a proper covariance for that whole environ-ment. The shifting in space sizes can be seen in figure 6.3. Here we see that the areas inside the two red circles differ in size, which makes it hard to use one constant covariance for the whole area, which we did.

Figure 6.3

A solution o that problem could be to first cluster the data, according to some form of space size, and compute the covariance for each cluster before we do the GMM on each cluster. Since the K-means clustering not was a good way too clustering this data on maybe we should use Mixtures of Regression Models as we talked about in section 2.5.

6.5.3 Handling small probability values

When we did the Gaussian Mixture models on the trajectories we had to com-pute the probability for each data point in the trajectories and then multiply them together. This lead to problems because if we had too many data points we got really small number to handle. If the total probability become less than 2.2250738585072014e-308, we got NaN results for the probabilities. That is because the fact that a float can not handle smaller numbers. To get around that problem we also change the K-value, which is a constant that decides on

(70)

how many data points to compare with when computing the expected value that a trajectory belonged to a model. However, after some evaluations on different values on K, the best results was received was when K was set to the same number as the number that we interpolated the trajectories to.

6.5.4 Set initial Models

The initial models where set at random, so we always ended up with different results. Some times we got better results than others, which could depend on the initialization of the models. So, to receive a good result, when using the GMM on trajectories we should have been needed a better initialization of the models.

6.6 The comparing of GP-map and the GMM-map

It was really hard to compare these two maps with each other. First of all they were not good models to use on the training data that we received. Secondly, the limited amount of time made it hard to find a more proper way to com-pare the two maps with each other. We have only managed to comcom-pare them visually with each other. There would have been a good thing to compare them by setting a position or a trajectory at the mall and see which of the two probabilistic models that produced the best result.

6.7 How GP and GMM fits on the warehouse

environment

According to this thesis, both GP and GMM do not work well on our training data. GP would not yield a wide covariance which is unexpected comparing it to modeling smooth paths using Gaussian Processes paper [2]. If GP’s is a bad choice for this problem remains to be seen because we have only tested

(71)

6.7. HOW GP AND GMM FITS ON THE WAREHOUSE ENVIRONMENT 61

with the data from the shopping center which is described in section 4.2. Thus the only conclusion that can be made is that GP’s is not scalable for large data-sets. Concluding this more tests is required, work on the kernel and finally continuing the implementation from the paper [2] to also implement path prediction.

GMM is bad mostly because of the fact that it is bad at handling shifting space size in environments. It is also a bad model to use because of the limi-tation of how many data points you can have in one trajectory, before we get to small values for the computer to handle when computing the probabilities. The reason why GMM is bad at handling shifting space sizes is because that it uses a constant covariance. If we have a too high covariance we will have too much trajectories that should not have effect on a model. If we have a too small covariance trajectories that should have impact on a model will not be able to have any effect on it.

If training data, from the ATC shopping center, do not look like the warehouse environment that is the purpose for this thesis, then GMM may fit its purpose. Otherwise it is not a good probabilistic model to use.

(72)

Comparing probabilistic models for human motion patterns

Bachelor Thesis