A SURVEY COMPARING DIFFERENT METHODS FOR CLASSIFYING TRAJECTORIES

(1)

Bachelor Thesis, 15 hp

BACHELOR’S DEGREE IN COMPUTING SCIENCE

2020

A SURVEY COMPARING DIFFERENT METHODS

FOR CLASSIFYING TRAJECTORIES

Philip Bäckström

(2)

(3)

Abstract

Abstract – In machine learning considering classification of GPS trajectories, there is a lot of different methods to classify a certain trajectory. There is also a lot of different methods to process raw GPS data. This paper will conclude the reason of segregation between different methods using the same algorithms. This will be done using two different methods, using the same algorithm which then will be analysed to get a result. This result will tell us the reason why a result could segregate even when using the same dataset and same algorithm.

(4)

Introduction

Since Geolife (Microsoft Research Asia) [1] collected a GPS trajectory dataset [2] which was collected during the span of 5 years (April 2007 to August 2012) a lot of different methods has been created. These methods include different classifying Artificial Neural Networks (ANNs) and traditional classifiers [9] [10] which has been made or used to predict people's movement patterns when using means of transport. Some of these methods will be named in the related work section of this paper. To identify different means of transport for humans is a fundamental step for various problems that could arise such as travel demand, transport planning and traffic management. Being able to see what transportation tool that is used without doing surveys on the street saves a tremendous amount of time, especially when this data could be collected through GPS trackers, which is why the classification of GPS trajectories could help classify raw GPS data from trackers. This data set has collected data for different types of transportation methods. Specifically, for this dataset, the means of transport are walking, bus, train, aircraft, car and taxi, bike and other. The goal of these classifiers is to be able to predict the means of transport by only using trajectories which contains a Global Positioning System (GPS)

coordinates and the current time of that trajectory. Some trajectories also contain a label. These labels represent the means of transport used in that specific trajectory at that specific time. That gives the opportunity to create all these different classifying methods since the data will

segregate whenever the user uses a different type of transport. These ANNs or traditional classifiers, all use some type of classifying algorithm, e.g. Random Forest, Decision Trees, Nearest Neighbor, Deep learning, etc. These algorithms have also been compared multiple times with trustworthy results, e.g. a method made by Sina Dabiri [3]. This method compares different algorithms and using deep-learning methods after they processes the data which give different results considering the different algorithms. However, these algorithms are used after the data has been processed in some way. Therefore, the question appears. Considering different algorithms, will the result of these algorithms segregates if the data is processed differently.

Some of these methods use the same algorithms which presents a good opportunity to compare these results with the different ways of processing the data. If there would be a difference, what causes this distinction.

1.1 Background

A lot of different attempts of classifying this data have occurred during the years since the data set was released to the public. These attempts work in their own way, but some are more efficient than others. Then there was a thought of the data format and if it could alter the accuracy of the classifier. Data quality can best be defined as “fitness for use” which implies quality is relative [12]. This means better formatted data for the specific usage means better quality hence better classification. Whenever data are processed differently and into different formats the results of that classification could alter.

1.2 Purpose and Research Questions

The purpose of this paper is to find the best method for data formatting and processing for the Geolife’s GPS trajectories [2]. Since a lot of different classifying attempts have occurred and a lot of different ways of converting the raw data files has occurred, there would be a purpose of finding the best method which could alter the way of processing this specific data and could create a new standard. This could save time for trials of new ways of classifying the data or data mining. Hence the research question. In classification of Geolife’s GPS trajectories, which method is the superior method and why, using F1 score and Accuracy to calculate the results. A hypothesis of the result is that the method made by Sina Dabiri will be the superior method since a lot more research and work has been made. This and considering other results from other sources, which will be mentioned in related work, their method has very good odds.

(6)

1.3 Delimitations

The delimitations of this work are to focus on only one classifying algorithm. This algorithm will be the random forests traditional algorithm using scikit-learn. The algorithms will be explained in chapter 3. Another delimitation is to only focus the work of two different methods to the problem. There are a lot of different methods to this problem, but only two will be compared in this paper.

Related Work

The related work on the subject contains many ways for comparing different methods for the problem. One survey called A survey and Comparison of Trajectory Classification Methods [16]

compares two different classifiers over several data sets using many different methods for these classifiers. These methods could range from traditional methods to ANNs/CNNs and Deep- learning algorithms. Also includes Sina Dabiris method Deep-Semi-Supervised -GPS- Transport-mode but does not use the traditional methods. Another kind of related work is a review who Xue Yang made, reviewing different classification algorithms and comparing these using different data sets. Xue Yang is both using traditional methods as well as ANNs/CNNs to compare results [17]. Another related work was from Luis Pedro Coelho who wrote a book about building machine learning systems [21] who talked a lot about building the systems but also about parameter tweaking which is a bit relatable to this paper since there will be some parameter analysis.

Algorithms and data

To get a greater understanding of the work in this paper, a few explanations of the methods used is needed. In this chapter, both the data and the algorithms will be explained briefly to get a better understanding.

3.1 Geolife GPS Trajectories

As mentioned in the introduction the GPS trajectories were logged during the time span of 5 years, from April 2007 to August 2012. The dataset itself contains 50,176 hours of logged GPS coordinates and timestamps. 91,5 percent of the logged coordinates and timestamps were logged in a dense representation, e.g. every 1-5 seconds or every 5-10 meters per logged point [4]. The data is representing a wide range of different outdoor activities and not only e.g. the walk to work. Examples of this could be shopping, sightseeing, dining, hiking and cycling. This is important for the classifiers since otherwise the classifiers would only be able to recognize those specific activities, e.g. the trajectory to work and not be able to classify other random activities which human mobility could make use of. The city logged in this dataset is in Beijing and contains 182 user's data.

The dataset contains both labelled data and unlabelled data. Labelled data is data which is categorised. E.g. for a specific trajectory during a specific timestamp the trajectory is

representing walking. Because of these labelled trajectories, categorising these trajectories is possible. The labelled data in the dataset is approximately 25 percent of the total data in time and about 10 percent in distance logged. The labelled data itself is categorised and will be shown in Table 1.

(7)

Transportation mode Distance (km) Duration (hour)

Walk 10,123 5,460

Bike 6,495 2,410

Bus 20,281 1,507

Car & taxi 32,866 2,384

Train 36,253 745

Airplane 24,789 40

Other 9,493 404

Total labelled 14,0304 12,953

Total unlabelled 1,292,951 50,176

Table 1: Total distance and duration of transportation modes in the labelled data.

The data itself is stored in PLT files. This file format is a vector-based plotter file which is primarily used for CAD or computer-aided design file. This format could be used for visualising the data itself using a plotter. The data is represented in Table 2.

Latitude Longitude 0 Altitude Nr of Days from

12/30/1899 Date(String) Time(String) 40.0523833 116.40345 0 154.199475 39215.1476157407 2007-05-13 03:32:34 40.0535166 116.4003833 0 144.356955 39215.1479861111 2007-05-13 03:33:06 40.0535333 116.4001667 0 144.356955 39215.148090277 2007-05-13 03:33:15 40.0534 116.40055 0 147.637795 39215.1576967593 2007-05-13 03:47:05 40.0532 116.4011167 0 147.637795 39215.1578587963 2007-05-13 03:47:19 Table 2: An example of a trajectory in the PLT files (Raw files).

Start (Date) Start (Time) End (Date) End (Time) Trajectory mode

2007/05/13 02:27:42 2007/05/13 03:33:15 car

2007/05/13 03:47:05 2007/05/13 04:24:42 car

2007/05/13 23:26:53 2007/05/14 00:03:05 car

Table 3: Representation of the labels to the trajectories. This label is separated in an own file which means it must be paired to the specific PLT file when used

.

3.2 Scikit-learn

Scikit-learn, also known as sklearn is a framework written in Python. This framework features various classifications, regression and clustering algorithms, including support vector machines, random forests, gradient boosting, k-means and DBSCAN. Following this, the framework is also designed to interoperate with both NumPy which is the Python numerical library and SciPy which is the scientific library. NumPy adds support for large, multi-dimensional arrays and matrices along with high-level mathematical functions which operates these arrays and matrices [6]. SciPy is mostly used in scientific computing and technical computing and includes modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing. However, SciPy is not used in this paper since it does not feature any useful tools in the dataset which is covered. Scikit-learn is mostly used in traditional methods when covering classification. One example of this is Random forests which is going to be used in this paper.

The reason scikit-learn is used in this paper is because this paper is featuring traditional classification algorithms. When it comes to traditional classification algorithms scikit-learn has the biggest library for several tree-based ensemble models as Random Forest, Gradient Boosting Decision Tree and XGBoost [22]. Another framework which could be used is TensorFlow.

However, since TensorFlow is heavily focused on artificial neural network solutions [5] the choice was scikit-learn. Also, the comparisons made in Xue Yangs work were made in scikit- learn which creates some support for the choice.

(8)

3.3 Random forests

Random forests algorithm is an ensemble learning method for both classification, regression and covering other problems which is operated by constructing a multitude of decision trees.

The random forests are designed by three steps: decision tree learning, bagging and from bagging to random forests.

Decision tree learning is built on decision trees which is a tool that uses a treelike model of decisions and their possible consequences. The decision tree could be explained using a flow chart to illustrate the process behind the decision trees.

Figure 1: A figure taken from towardsdatascience [7] which is an illustration of a simplified decision tree. (A Disclaimer: I do not own this photo. This photo was taken from

“towardsdatascience” and all credit goes to the owner.)

The Figure 1 is explained through the string on the top which contains 7 numbers. These numbers are coloured, can be underlined and has a value. The transition through the tree is decided by the properties of the string. Let us take the first number which is the underlined red number 1. Is this red? Yes, go to the left side to the next transition where the question is, is it underlined? Yes, and there this number is separated in its own class. This, however, is just an illustration and in real life the data is not this clean as it is in this example [7].

The next step of the algorithm is bootstrap aggregating, also known as bagging. Bagging is a technique for reducing noise caused by variation of an estimated prediction function. Since trees are notorious for being noisy are bagging very efficient for tree algorithms since the most tree- based algorithms are high-variance, low-bias procedure. The idea behind bagging is to average noise, but approximately unbiased models which will reduce the variance of the data in the tree.

The algorithm is built by generating new trees which are identically distributed (i.d), which creates equal bias in the bagged trees as grouped to the individual bagged trees [8].

Bagging is a machine learning meta-algorithm which is designed for stability and accuracy of the algorithm. The algorithm is built by for each training set T of size n, the algorithm generates m new training sets, each of size n’. For this case, the bagging is generating new training sets which are randomly selected as samples. However, in regular bagging, successful trees are not

dependent on previous trees, whereas every tree is independently constructed using a bootstrap sample of the data set. Random forest contributes with an additional layer of randomness to bagging. Random forest changes how the classification trees are constructed. Standard trees are built by each node is split by using the best split among all variables contrary to random forest

(9)

which each node is split by using the best variables of the subset which is randomly chosen at that node [15].

That is the basis of the algorithm. For a closer look the algorithm, it is as follows:

1. Take n bootstraps samples from the original data.

2. For each of the bootstraps n in (1) grow an unpruned classification tree with the

following modification, which is explained above. Choose the randomly sample m of the predictors and choose the best split from among those variables instead of the regular bootstrapping which is at each node choose the best split among all predictors.

3. Predict new data by aggregating the predictions of the n trees. That would say the majority votes. [15]

In the work of Xue Yang where they compared different algorithms on different data sets, they used both Random forest and Sina Dabiris model. Because Xue Yang got a good result which could be trusted, there would be less of a risk when using the same algorithm and same method as Yang did [17].

3.4 F1 score and Accuracy

F1 score is a way to measure different types of test’s recall and precision score and is also known as balanced F-score or F-measure. The F1-score is measured by the test’s precision and recall.

The formula is the following:

𝐹1 = ( 2

𝑅𝑒𝑐𝑎𝑙𝑙⁻¹+ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛⁻¹) = 2 ⋅ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙

The F1 score could vary between 0-1 where 1 is best, 100% on both precision and recall. The precision could also vary between 0-1 where if precision would reach 1.0, that would mean that every item labelled as belonging to a certain class C does belong to this class. However, it does not say anything about the other items that belongs to class C that was not labelled. There is where recall comes into place. Recall could also vary between 0-1 where if recall would reach 1.0, all the items from class C was labelled to belong to class C. However, it does not say how many items from other classes which were incorrectly also labelled to belong to class C [13].

The precision is calculated through following formula:

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 The recall is calculated through following formula:

𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

This means the F1 score is based upon precision and recall and converts the values of the true positives, false positives, true negatives and false negatives to a single score. This is also called the harmonic mean of precision and recall [11]. Harmonic mean is one of several kinds of

“average” which means F1 score is a mean for the precision and recall.

Accuracy is calculated through taking the number of correct guesses divided by the total number of guesses. In this case scikit-learns accuracy_score was used. How this is calculated is by checking each prediction contrary to its true label and then returning the number of correct guesses [23].

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑙𝑎𝑏𝑒𝑙𝑠 𝑡𝑜𝑡𝑎𝑙 𝑙𝑎𝑏𝑒𝑙𝑠

(10)

F1 score was chosen because of Camila Leite da Silva [16], from related work who mentioned the reason behind her choosing the F1 score was because the F1 score and accuracy is common for evaluating classification tasks. This means the F1 score and accuracy is a good choice to get a good sense of the performance by the classification algorithms.

Method

The goal of this paper is to find a superior method for converting raw GPS data to a format which is efficient and appropriate representation for the chosen classifiers and find out why a certain method is better. This paper will use two different methods to the problem. They are both available for traditional classification algorithms and more importantly to the same. The random forest algorithm will be used for this comparison and to evaluate the result, F1 score and accuracy will be used. The different methods will both use K-Fold cross validation with 5 splits to get the training data and the test data.

How this will be achieved is to first find out the “superior method”. What is meant by this is the method which brings the best score considering F1 score and accuracy. If there would be an instance where accuracy or F1 score would give different results, whereas one method is better at accuracy than the other or vice versa, there would be an analysis about that. These methods will be running through the classifiers which are already implemented. This would result in an F1 score and an accuracy score. These scores will then choose the superior method. These scores will be shown in a graph which will show a clear picture of the superior method.

About the data itself, it is explained in chapter 3.1 which mentions 182 different user’s data. This survey will only use 59 user’s data. This is caused because of one of the methods which were unable to run with more user data. This could probably be prevented through another PC, but since I did not have access to another computer this resulted in a limit of 59 users. These users are chosen from 0-58 in the Geolife’s GPS Trajectory dataset [2]. The specifications of the PC used are shown below.

CPU Intel® Core™ i7-7700K @ 4.2GHz (4.5GHz)

GPU NVIDIA GeForce GTX 1070

PSU 850W (Gold-certified)

RAM DDR4 32.0 GB

Mother board Gigabyte Z270-Gaming K3

Storage Samsung SSD 970 EVO 250GB NVMe M.2

OS Windows 10 Pro, 64-bit

Table 4: Specs of PC. These specs itself are not bad. A hypothesis on the reason behind not being able to run a method could be because of the OS used. A trial was not made to run on a Linux OS.

When the results show a superior method, there will be an analysis. This analysis will contain a closer inspection of the superior method. This includes looking at the data format, which I will refer as a header and the way the data is inserted into the classifying algorithm. Since the algorithms does not separate in any way because the algorithm is already implemented in scikit- learn, the algorithm itself will not be looked at. This could cause a limitation since this survey would only apply to the “Random forest” algorithm itself and only to the Geolife’s GPS Trajectory data set and not any other classification algorithm or data set.

(11)

The analysis itself will firstly be of the inserted data and look if it is sorted or not. After this another comparison between the methods will be created so the runs itself does not alter because of the sorted data. After this there will be a trial and error attempt. This will be of the superior method and try out to “zero out” different parentheses in the header. This will be done by altering the code in the method. These different trials will then be compared to find the parentheses, which makes the most impact on the F1 score and accuracy. After this trial, the parentheses will be compared to the other method to see why it is worse.

4.1 Choice of method

The chosen methods are inspired mostly by related work. When it comes to the comparisons of the different methods, I decided to choose the most common method to the issue where there is a straight up comparison between the same evaluation tools as F1 score or Accuracy. As Camila Leite da Silva [16] mentioned, the most common evaluation tool for classification is F1 score and Accuracy. When it comes to the straight up comparisons, a good inspiration was from Sina Dabiri [17] whose method is used in this survey as well, did a comparison between his “SECA”

and other traditional algorithms using a table. He used the same technique as going for F1 score in some cases, but also Precision and Recall in some cases.

To analyse the parameters, the inspiration came from a book by Luis Pedro Coelho which was about machine learning, talking about the importance of tuning parameters [21]. This inspired the approach of altering the parameters to check different outcomes of each parameter.

4.2 Data formats

This section will cover the different methods and their formats. More of a description of them and the way it could alter the results. There will first be a chapter explaining the “Deep-Semi- Supervised” method in 4.2.1 and then the “Beijing” method in chapter 4.2.2. The different formats of the different methods are visualised below. The reason why only the data formats are shown and not the procedure for the calculations of how these were found, is because the procedure itself is not important in this paper but the formats.

4.2.1 “Deep-Semi-Supervised” method

The “deep-semi-supervised” method is not using a neural network for the purpose of this paper.

The method itself is a method done by Sina Dabiri which created this method for their unique way of classifying this dataset. They call it “SECA” which is a Semi-Supervised Convolutional Autoencoder [18]. In this paper, their “handmade” methods which include SVC, k-nearest neighbours, Decision Tree classifier, MLP classifier and Random Forest classifier are used, and more specifically the random forest classifier. These different classifiers are implemented by Scikit-learns own library. The most useful information on this method is their format and how they use their data when inserting into the different classifying algorithms. When it comes to format, the focus will be their headers. The headers include what type of data which is inserted into the classifying algorithms. These “headers” are mostly a template for the data.

The header format is shown in Table 5.

Distance Average Velocity Expectation Velocity MaxV1 – MaxV3

MaxA1 – MaxA3

Heading Rate Change

Stop rate Velocity Change Rate Label

Table 5: The format of data inserted into the classifying algorithm. These different columns are the visualisation of the data inserted. The columns MaxV1 – MaxV3 means there are three columns which are MaxV1, MaxV2 and MaxV3. The same goes for the MaxA1 – MaxA3.

The code itself is a bit altered to fit the other method. The reason behind this is to get the results as accurate as possible. The things altered is the data itself is using Scikit-Learns own K-Fold cross validation with 5 folds. Before this method created its own K-Fold cross validation. The

(12)

reason behind changing is because the data was sorted when inserted to the classifier. This could alter the data and the difference will be shown in the results.

4.2.2 “Beijing” method

The “Beijing” method is a method made by Jim Bremner also known as “jbremz”. This method was created for a data science research project in the Applied Mathematics & Mathematical Physics Group at Imperial College London [19]. The projects focus was a lot of mining the trajectories itself and not as much at the classifications. However, an attempt on classification algorithms was made and used scikit-learns different algorithms, including Random forest.

Their “header” for the classifiers itself is shown in Table 6.

Path Label-state Duration Length Point Count Crow Length Path-Crow Ratio Covered Area Window Area Area/Length Area/Time Hurst Exponent Turning-angle/Length Turning-angle/Time Mean Speed Mode of Transport

Table 6: The format of the data inserted into the different algorithms. These columns are the

“header” which visualises the data inserted into the algorithms for classification.

Result and Analysis

The goal of this paper was to find a superior method and the reason why it is the superior method for processing Geolife’s GPS trajectories. This chapter will give the result of which method is the superior method and the reason why will be in the discussion chapter (Chapter 6).

The different methods were running approximately 5 times each to get an average score. This score will contain both F1 score and an accuracy score. The accuracy scores will be displayed in Figure 2 and the F1 scores will be displayed in Figure 3. The name “Deep-Semi-Supervised” does not mean it was using a deep semi supervised method in this scenario. It is just the name of the project.

(13)

Figure 2: Accuracy comparison between Beijing project method and “Deep-Semi-Supervised”

project method. The Y-Axle shows the accuracy 0-1, where 1 is 100% accuracy and all correct.

The X-Axle shows the different runs, where 1 is the 1^st run and 5 is the 5^th run.

Figure 3: F1 Weight comparison between Beijing project method and “Deep-Semi-Supervised”

project method. The Y-Axle shows the F1 score which goes from 0-1, where 1 is 100% correct when classifying and 0 is when the classifier has 0% correct. The X-Axle shows the different runs, where 1 is the 1^st run and 5 is the 5^th run.

The results clearly show a superior method. This method is the “Deep-Semi-Supervised”

project’s method. As shown in the Figures 2 and 3, the “Deep-Semi-Supervised” method has both the highest Accuracy as well as the highest F1 score. However, after a bit of analysing the code and data itself when inserted, the data was a bit different from each other. The “Deep- Semi-Supervised” method used sorted data which is shown in Figure 4 and the “Beijing” method used randomized data which is shown in Figure 5.

(14)

Figure 4: The prediction and the label for the data used in “Deep-Semi-Supervised” method.

Figure 5: The prediction and the label for the data used in “Beijing” method.

The labels itself are also altered, but this is only how the different methods encode their labels.

After this discovery another try for getting the result was made to see some difference if both the data is randomized more. The results are in Figures 7 and 8, and an example of the data of

“Deep-Semi-Supervised” is shown in Figure 6.

(15)

Figure 6: The “randomized” datasets in “Deep-Semi-Supervised” method.

Figure 7: Accuracy results of randomized datasets. The Y axis show the accuracy in the format of 0-1 where 1 is 100% accuracy and 0 is 0% accurate. The X axis is the number of runs, where 5 is the fifth run. This is just to show there is not much fluctuation. This was also running 25 times with average over each 5^th run.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1 2 3 4 5

Accuracy

Deep-Semi-Supervised Beijing

(16)

Figure 8: F1-Weight results of randomized datasets. The Y axis shows the F1-score where 1 is max and 0 is the lowest. The X axis is the number of runs, where 5 is the fifth run. This is just to show there is not much fluctuation. This was also running 25 times with average over each 5^th run.

Figure 7 and 8 still show the same superior method which is the Deep-Semi-Supervised method.

This gives a superior method which is the “Deep-Semi-Supervised” method. To conclude the research question which was the reason why there would be a superior method, there must be an analysis of this method. To analyse, the only difference between these two methods is their

“header” which decides the data inserted into the classifier. The analyse will be to compare the multiple columns with each other to conclude why this certain column is important for a high accuracy when classifying this dataset.

How this analyse will be made is to “zero out” all the columns except one and run a test. What

“zero out” means in this example is to instead of having legit data under a column, all data in that column will be “0”. This would result that in every trajectory the parameter is 0. This test will include 5 runs and show its average score in both accuracy and F1-weight to be compared.

The reason for only 5 runs can be shown in the results where there were 5x5 tries and the average of each 5 runs calculated. Figure 7 and 8 show that the fluctuation is almost

incomparable. The reason for this “method” of finding out which one of the parameters is more important is because of an article which compared different outputs with different numbers of parameters, and the result showed an improvement [20]. How this could be compared to the current situation is that instead of removing parameters which could take a lot of time, “zeroing out” would save a lot of time when altered in the code. However, in this paper the experiment will only include one parameter to see which parameter is the most accurate parameter. The results of these tweaking of the parameters will be shown in a table.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1 2 3 4 5

F1-Weight

Deep-Semi-Supervised Beijing

(17)

D AV EV VV MaxV1 – MaxV3

MaxA1 – MaxA3

HRC SR VCR Accuracy / F1 (%)

0, 1 1 1 1 1 1 1 1 1 1 88.3299/

88.3526

0, 1 0 1 1 1 1 1 1 1 1 87.3432/

87.3624

0, 1 1 0 1 1 1 1 1 1 1 88.6401/

88.6161

0, 1 1 1 0 0 1 1 1 1 1 87.7661/

87.7445

0, 1 1 1 1 0 1 1 1 1 1 88.3865/

88.4053

0, 1 1 1 1 1 0 1 1 1 1 87.4835/

87.4654

0, 1 1 1 1 1 1 0 1 1 1 84.1291/

83.8807

0, 1 1 1 1 1 1 1 0 1 1 88.5836/

88.5867

0, 1 1 1 1 1 1 1 1 0 1 88.4145/

88.4196

0, 1 1 1 1 1 1 1 1 1 0 87.0893/

87.0781 Table 7: meanings of shortages:

The columns (0,1) mean, if the data in the parameter are legit (1) or 0 (0). D: Distance, V:

Velocity, A: Acceleration, AV: Average Velocity, EV: Expectation Velocity, VV: Variance of Velocity, HRC: Heading Rate Change, SR: Stop Rate, VCR: Velocity Change Rate.

As shown in Table 7, the parameters as singular did not do as much as expected. After this only the single parameter will be tested. The reason behind this is to see which parameter is the

“strongest” parameter by watching its accuracy and F1-score alone. As a conclusion of Table 7, the only parameter which changed the result for more than 1 percent was the MaxA1 – MaxA3.

(18)

D AV EV VV MaxV1 – MaxV3

MaxA1 –

MaxA3 HRC SR VCR Accuracy/F1 (%)

0, 1 1 1 1 1 1 1 1 1 1 88.3299/

88.353

0, 1 1 0 0 0 0 0 0 0 0 51.9025/

51.962

0, 1 0 1 0 0 0 0 0 0 0 57.6262/

57.464

0, 1 0 0 1 0 0 0 0 0 0 55.0887/

55.354

0, 1 0 0 0 1 0 0 0 0 0 38.6809/

39.031

0, 1 0 0 0 0 1 0 0 0 0 69.5521/

68.501

0, 1 0 0 0 0 0 1 0 0 0 51.1423/

50.238

0, 1 0 0 0 0 0 0 1 0 0 44.6843/

31.482

0, 1 0 0 0 0 0 0 0 1 0 56.8369/

56.157

0, 1 0 0 0 0 0 0 0 0 1 59.7403/

59.506

Table 8: the meanings of shortages:

The columns (0,1) mean, if the parameter, e.g. D has legit or 0 as data. 1 if legit, 0 if 0. D:

Distance, V: Velocity, A: Acceleration, AV: Average Velocity, EV: Expectation Velocity, HRC:

Heading Rate Change, SR: Stop Rate, VCR: Velocity Change Rate.

As shown in Table 8 the three most important parameters are MaxV1 – MaxV3 with 69.55%

accuracy and 68.5% F1-score. This would give a conclusion of why this certain method is better.

Since three parameters of this method would give accuracy of 69.55% and F1-score of 68.5%

compared to the entire header of “Beijing’s” method, whose accuracy and F1-score were around 73-75%, would result in a huge difference between the methods.

To break it down even further, since the parameters MaxV1 – MaxV3 contain three parameters, it would be interesting to see these parameters alone.

(19)

MaxV1 MaxV2 MaxV3 Accuracy/F1 (%)

0, 1 1 0 0 58.3866/

58.4891

0, 1 0 1 0 57.2883/

57.6946

0, 1 0 0 1 59.3167/

59.5906

Table 9: the meanings of shortages: The columns (0,1) mean, if the parameter, e.g. D has legit or 0 as data. 1 if legit, 0 if 0, MaxV1: Max Velocity (1), MaxV2: Max Velocity (2), MaxV3: Max Velocity (3)

Table 9 shows a representation of MaxV1, MaxV2 and MaxV3 alone. The MaxV1 to MaxV3 include the data of the velocity, which is sorted from 1 to 3. This means the parameters only contain a single speed which is the fastest, 2^nd fastest and the 3^rd fastest velocity in the trajectory. The Table 9 says that these parameters are the most powerful parameters in the method.

Discussion

This section will be focused to discuss the results of the paper and the reason behind this. There will be a conclusion to this paper and the limitations of this work, as well as future work which could be done followed by this paper.

6.1 Interpretation of Results

The result of the paper was that the header of a trajectory inserted into a classifying algorithm was the main factor of the performance when evaluated with F1 score and accuracy. Closer inspected was that in this specific case, the max velocity of a trajectory was the most powerful parameter when singled down, which is shown in Tables 8 and 9. Comparing this to the

“Beijing” version, which only included a mean set of velocity which is shown in Table 6. A reason for this could be that if a bus got stuck in traffic, its mean speed would lower tremendously which could reach the same velocity as a walking person. This compared to the max velocity, which would differ a lot more.

6.2 Limitations

The limitations of this work were that only two different methods were compared. Another limitation is that the methods only compared each other with one algorithm. This causes the survey to be limited to this dataset and these two methods with one algorithm. However, these thoughts could be taken in consideration when comparing different data sets or algorithms in future work.

6.3 Conclusion and Recommendations

As conclusion of the work, the cause of the difference between the two methods was concluded into the format of the data. The Sina Dabiris method used max velocity as parameter which was the most powerful parameter in the header of the format. This parameter compared to “Beijing”

version which used only mean velocity which could be reduced to walking speed, gave much better F1 score and accuracy. The conclusion is, for this dataset and for others, which are using classification of vehicles compared to walking is to use max velocity to segregate the different means of transport to get the best accuracy.

(20)

6.4 Future Work

Future work on this topic could be to minimize the limitations. That means to compare different datasets with the same method to see if the methods would continue the same pattern or

segregate. Also, to compare more methods with other classification algorithms to find a better algorithm which could provide a higher F1 score and accuracy. This could provide progress in classification as a group when creating the different methods. Also, to try out new parameters and compare these to the velocity to get a new perspective of the importance of the parameters.

Another future work could be to analyse the different parameters of “Beijing” version to see how the “mean velocity” parameter would compete against “max velocity” in Sina Dabiris version.

(21)

References:

[1] Dr Yu Zheng, “GeoLife: Building Social Networks Using Human Location History.” (April 2007). Distributed by Microsoft Asia (Geolife). https://www.microsoft.com/en-

us/research/project/geolife-building-social-networks-using-human-location- history/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-

us%2Fprojects%2Fgeolife%2Fdefault.aspx

[2] Dr Yu Zheng, “GeoLife: Building Social Networks Using Human Location History.” (April 2007). Distributed by Microsoft Asia (Geolife). https://www.microsoft.com/en-

us/download/details.aspx?id=52367&from=https%3A%2F%2Fresearch.microsoft.com%2Fen- us%2Fdownloads%2Fb16d359d-d164-469e-9fd4-daa38f2b2e13%2F

[3] Sina Dabiri, “Semi-Supervised Deep Learning Approach for Transportation Mode Identification Using GPS Trajectory Data.” (IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL 32, NO 5), May 2020. Accessed: June 5, 2020. [Online]. Available:

https://ieeexplore-ieee-org.proxy.ub.umu.se/stamp/stamp.jsp?tp=&arnumber=8632766&tag=1

[4] User guide in the GPS trajectory folder. https://www.microsoft.com/en-

us/download/details.aspx?id=52367&from=https%3A%2F%2Fresearch.microsoft.com%2Fen- us%2Fdownloads%2Fb16d359d-d164-469e-9fd4-daa38f2b2e13%2F

[5] Jeff Dean, “TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed Systems”, TensorFlow, Google Research, November 9, 2015. Accessed: June 5, 2020 [Online].

Available: http://download.tensorflow.org/paper/whitepaper2015.pdf

[6] Travis E. Oliphant, “Python for Scientific Computing,” Brigham Young University, Brigham, England, 2007. Accessed: June 5, 2020. [Online]. Available:

https://web.archive.org/web/20131014035918/http://www.vision.ime.usp.br/~thsant/pool/oli phant-python_scientific.pdf

[7] Tony Yiu, “Understanding Random Forest”, Towardsdatascience.com, June 12, 2009.

Accessed: June 5, 2020. [Online]. Available: https://towardsdatascience.com/understanding- random-forest-58381e0602d2

[8] Trevor Hastie, Robert Tibshirani, Jerome Friedman, “8.7 Bagging,” in The Elements of Statistical Learning, 2nd ed. Springer, 2017, ch. 8.7, pp. 282–286. [Online]. Available:

https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf

[9] Deep-Semi-Supervised-GPS-Transport-Mode, Dec 9, 2019. [Online]. Available:

https://github.com/sinadabiri/Deep-Semi-Supervised-GPS-Transport-Mode

[10] Beijing-Trajectories-Project, Oct 24, 2018 [Online]. Available:

https://github.com/jbremz/Beijing-Trajectories-Project

[11] Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.

[12] Giri Kumar Tayi and Donald P. Ballou, Guest Editors, “Examining Data Quality,”

COMMUNICATIONS OF THE ACM, VOL 41 NO 2 Feb 1998. Accessed: June 2020. [Online].

Available: https://dl.acm.org/doi/pdf/10.1145/269012.269021

(22)

[13] Powers, David M W (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation" (PDF). Journal of Machine Learning Technologies.

2 (1): 37–63.

[15] A. Liaw, “Classification and Regression by RandomForest”, Meck & Co, November 2001.

Accessed: June 2020. [Online]. Available:

https://www.researchgate.net/profile/Andy_Liaw/publication/228451484_Classification_and _Regression_by_RandomForest/links/53fb24cc0cf20a45497047ab/Classification-and- Regression-by-RandomForest.pdf

[16] C. L. Silva, V. Bogorny, L. M. Petry, “A Survey and Comparison of Trajectory Classification Methods”, Programa de Pós-Graduac ̧ ̃ao em Ciências da Computacão (PPGCC) Universidade Federal de Santa Catarina (UFSC) — Florianópolis, Brazil, March 29, 2020. Accessed: June 2020. [Online]. Available:

https://www.researchgate.net/publication/337791304_A_Survey_and_Comparison_of_Trajec tory_Classification_Methods

[17] X. Yang, K. Stewart, L. Tang, Z. Xie and Q. Li, “A Review of GPS Trajectories Classification Based on Transportation Mode”,Faculty of Information Engineering, China University of Geosciences, Wuhan 430074, China;xiezhong68@gmail.com2Department of Geographical Sciences, University of Maryland, College Park, MD 20742, USA;stewartk@umd.edu3State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing,Wuhan University, Wuhan 430079, China; tll@whu.edu.cn4College of Civil Engineering, Shenzhen University, Shenzhen 518060, China; liqq@szu.edu.cn*Correspondence: yangxue@cug.edu.cn;

Tel: +86-158-2749-1182, Sept 14, 2018, Accessed: June 2020. [Online]. Available:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6263992/pdf/sensors-18-03741.pdf

[19] Beijing Trajectories – Project Summary, Oct 2, 2018 [Online]. Available:

https://github.com/jbremz/Beijing-Trajectories-

Project/blob/master/Exploratory%20Analysis/Beijing%20Trajectories%20-

%20Exploratory%20Analysis%20Summary.ipynb

[20] Karan Bhanot, “Finding the right model parameters”, Towardsdatascience.com, Mar 25, 2019. [Online]. Available: https://towardsdatascience.com/finding-the-right-model-

parameters-3670a1c086b3

[21] L. P. Coelho and W. Richert, “Building machine learning systems with Python : get more from your data through creating practical machine learning systems with Python” , 2nd ed.

Birmingham, England: Packt Publishing, 2015.[Online]. Available:

https://ebookcentral.proquest.com/lib/umeaub-ebooks/detail.action?docID=2000929,

(23)

[22] Supervised Learning models, Scikit-Learn [Online]. Available: https://scikit- learn.org/stable/supervised_learning.html

[23] sklearn.metrics.accuracy_score, Scikit-Learn [Online]. Available: https://scikit- learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html