KhangTran Statistical-BasedSuspectRetrievalUsingModusOperandi

(1)

Master of Science in Engineering: Computer Security June 2020

Statistical-Based Suspect Retrieval

Using Modus Operandi

Khang Tran

(2)

This thesis is submitted to the Faculty of Engineering at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in Engineering: Computer Security. The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information: Author(s): Khang Tran

E-mail: khtr15@student.bth.se

University advisor: Dr. Martin Boldt

Department of Computer Science and Engineering

Faculty of Engineering Internet : www.bth.se

(3)

Abstract

Introduction. The police and the investigation team has been manually doing be-havioural analysis and connecting different crimes to an offender. With the help of computers technologies, databases, and automated system, the statistical analysis of the offender’s behaviour significantly improved. There we can transfer from a manual process to an automated one, and the investigator can allocate time and resources better by prioritising the offenders to investigate. In this study, we create and experiment with a proof of concept system that ranks and prioritise different offenders using the Random Choice method in combination with the state of the art Spatial-Temporal method.

Objectives. In experimenting with the proof of concept system, we are aiming to understand the effect of different offender’s behaviour having on the offenders ranking and the effect of having multiple different numbers of reference crimes in the database. The objective is also to understand the role of consistency and distinctiveness in offenders ranking. Moreover, understanding the performances of our proof of concept system comparing to already existing methods such as Random Choice, Spatial-Temporal and a baseline method that based on pure randomness.

Method. The method we chose for this study was an experimental study. With an experimental environment with independent and dependent variables, we presented and evaluated the system. We were using the experimenting approach because it has a stable presence and widely used in similar studies in this field.

Results. After the experiments, we found that different Modus Operandi (MO) categories have a different effect on the ranking results and different distinctive com-binations of MO categories also has different accuracy when ranking the offenders. Offenders that were consistent with more references crime in the database were often higher ranked and were linked more correctly. Our proof of concept system shows significant improvement over Random Choice method and the Spatial-Temporal method.

Conclusion. From the results, we concluded that the proof of concept system displays a significant accuracy in ranking and prioritising offenders, there different MO categories and combinations of them has a different effect on the accuracy of the ranking. The ranking system was also affected by the number of reference cases that exist in the database. Future works can extend the study by trying to improve different aspects of the proof of concept systems, such as the Random Choice aspect or the Spatial-Temporal aspect.

Keywords: Suspect Retrieval, Modus Operandi, Spatial-Temporal, Random Choice, Statistical Analysis.

(4)

(5)

4 Results and Analysis 29 4.1 Experiment 1: Analysis the specific MO-categories . . . 29 4.2 Experiment 2: Identifying the preferred combination of MO-categories 31 4.3 Experiment 3: Identifying the amount of cases related to each offender 33 4.4 Experiment 4: Preferred weight for Random-Choice with

Spatial-Temporal method . . . 36 4.5 Experiment 4.1: Identifying the methods performances . . . 38 4.5.1 Statistical analysis . . . 38

5 Discussion 41

5.1 Experiment 1: Analysis the specific MO-categories . . . 41 5.2 Experiment 2: Identifying the preferred combi-nation of MO-categories 41 5.3 Experiment 3: Identifying the amount of cases related to each offender 42 5.4 Experiment 4: Preferred weight for Random-Choice with

Spatial-Temporal method . . . 42 5.5 Experiment 4.1: Identifying the methods performances . . . 43 5.6 Affect of consistency and distinctiveness on accuracy of the ranking . 43 5.7 Validity threats . . . 44

6 Conclusions and Future work 45

References 47

A Supplemental Information 51

(7)

List of Figures

4.1 Boxplot displaying results from Experiment 1 with nDCG as metrics 30 4.2 Boxplot displaying results from Experiment 2 with nDCG as metrics 32 4.3 Boxplot displaying results from Experiment 3 with nDCG as metrics 34 4.4 Boxplot displaying distribution in ranking out of 27 offenders based

on the number of cases . . . 35 4.5 Boxplot displaying results from Experiment 4 with nDCG as metrics 37 4.6 Boxplot displaying results from Experiment 4.1 with nDCG as metrics 40 A.1 Chi-square Distribution Table . . . 52 A.2 Critical values for the two-tailed Nemenyi test . . . 53

(8)

(9)

Abbreviations

SpaTemp - Spatial-Temporal RC - Random Choice

PC - Pure Chance

RC + SpaTemp - Random Choice with Spatial-Temporal MO - Modus Operandi

MO-categories - different categories of MO nDCG - normalised Discounted Cumulative Gain Bpref - Binary Preference

P@R- R-precision R@R - Recall at R F1 - F-measure

(10)

(11)

Chapter 1 Introduction

In many criminal cases, the police or the investigation team has manually profiled and categorised the Modus Operandi (MO), for behavioural analysis and case linkage [22]. With computer technologies, databases and statistical analysis, an automated system for doing case linkage was created to facilitate the manual work required and speed up the time that was necessary to link different crimes utilising behavioural analysis [14].

Case linkage is behavioural analysis, which a criminal investigator can utilise MO. MO is described as the offender’s pattern of behaviours to commit a crime, in short MO can be explained as an offender habit, and techniques [23, 12, 16]. The criminal investigator utilised the behavioural analysis to be more informed of different cases, and various research had also validated the usability of behavioural analysis as expert witness evidence [15]. Early methods used spatial and temporal information to analyse different crimes and later switched into a more detail analysis by incorporating other MO attributes as the logged information became more detailed [14]. However, most of the actual work still execute manually and are often time-consuming. There the number of cases often exceed the capacity of an investigator to link the crimes to any offenders [24].

The behavioural analysis often relies on two assumptions for the offenders. First, the offender is consistent; the same offender displays a similar pattern across the crimes that were committed [4]. Secondly, the offender is distinctive; the behaviour of the offender is different from other offenders to a level that can separate them from each other [4]. By having a more efficient way of collecting data and storing them, the possibilities to analyse how offender’s consistency between different crimes affect the behavioural analysis are also available. In cases there, the description of MO in crime reports going from free text to more checkbox style, the possibilities for more specific questions allow the system to distinguish different offenders, based on the distinctiveness in offenders MO.

An automated system that is utilising an existing statistical method from previous research was created in Japan [14], this system was for suspect retrieval based on the probability calculation with the Random Choice method [14]. The implemented system showed that more recorded cases and MO items in the database yielded better results and accuracy of the system improved.

The police and investigators are utilising the specific questions to collect the MO items that will be stored in the database. This stored information or items can later be divided into different categories of MO (MO-categories) (e.g. type of house the offenders targeting, type of tools the offenders using to breaking, items that were

(12)

4 Chapter 1. Introduction taken). The Random Choice method will utilised MO-categories to analysing the MO patterns of the offenders.

In the current research, we will be creating a similar system that is also using the same Random Choice method and combine it with a state of the art Spatial-Temporal method. The problems this research is addressing are the inefficiency of manual behavioural analysis and case linkage. This experiment is also discussing how consistency and distinctiveness are affecting the results. Furthermore, the research by Yokota and Watanabe lacks a comparison with other methods, and we will resolve this problem by making a comparison with different methods against our implemen-tation. The two different methods for comparison are Spatial-Temporal method that is considered state of the art and a Pure Chance method that will act as a baseline. Although our research is in a Swedish setting with burglary dataset, the limitation of this experiment is not limited to burglary dataset. However, it can also apply to both other crime types (e.g. other property thefts or violent crimes) as well as other crimes in other countries.

1.1 Related work

Ashmore-Hills et al. introduced the concept of behavioural crime linkage (BCL). There they argued that BCL was one of the methods that enable crime investigator to prioritise their workload [1]. BCL allowed them to identify crime series and to connect the crimes to a specific offender. Ashmore-Hills et al. determined that BCL has researched over the last 10-15 years, but there are still some un/under-explored areas. Some of the areas they suggested for future works are developments, implements and evaluates BCL decision-support tools [1].

Yokota et al. in their research proposed and developed a system that utilised a statistical approach to solve series crime linkage [14]. The goal of their study was to use the MO of prior offenders to score and predict the probabilities of whom was most likely to commit a given crime. The method the researchers used to calculate the probability was the Random Choice method. This specific proposed method performs best when more cases exist in the database and the recorded MO items in each case increase according to the researchers’ results [14].

Research by Porter also touched on a similar topic [25]. His research focused on using statistical approaches to differentiate a group of crimes events and prioritise different offenders for further investigation. The goal of his study was to create a model by using hierarchical clustering, and the Bayers factor for crime series clus-tering, crime series identification including suspect prioritisation. His experiment concluded that the suggested model showed good accuracy with a low false-positive rate in both crime series clustering and crime series identification [25].

(13)

1.1. Related work 5 Another study by Tonkin et al. there they improved previous studies by including more methods for comparison across different type of crimes [19]. The goal of the study was to address some of the limitations in behavioural crime linkage research, which include lack of methods comparison. The seven chosen methods in this study divided into three separate categories (logistic regression, classification tree, and probabilistic), the different types of crimes for comparison were residential burglary, car theft, and commercial robbery. Using area under the curve as the metric, showed that the Logistic category performed best in differentiating crimes for residential burglary and commercial robbery. In contrast, simple logistic performed best for car theft [19].

MO or offender behaviour during a crime can use in cases where forensic evidence (e.g. fingerprints or DNA) are lacking. Tonkin et al. did one of the first studies that suggested using behavioural case linkage to identify crimes committed by the same person [20]. Those results proposed that it is possible to link crimes using a different aspect of offender behaviour. The old methods incorporated both spatial and temporal elements, as well as criminal behaviour and tested across multiple types of crimes including violent, sexual and property-related. The results that came out were good regarding differentiating linkage crimes from non-linkage [20].

One of the more recent research by Tonkin et al. focused on just testing if geographical, temporal, and MO in a crime scene behaviour were enough to use in BCL, with crime series in different types of crime [18]. The method used in the study was pairing various solved crimes and unsolved crimes together to find similarities between them, and by doing so, they could distinguish crimes that were committed by the same offenders and crimes that were not. They discovered that linked crime identified and determined with high-level accuracy and the discovery led them to the conclusion that offender behaviour was consistent and distinctive enough to use in a BCL situation in different types of crimes [18].

The idea of linking crimes based on an offender’s behaviour already exists in the early 90s. In a study by Daves, she talked about the 90s complement methods to profiling in sexual cases, where DNA information was lacking [8]. The method she mentioned, utilised a sexual assault index, with information such as time, loca-tion, offender’s descriploca-tion, victim’s age and sex, with the inclusion of behavioural parameters. The samples in the index were limited, but the results have shown a significantly accurate linkage between unsolved cases and their offenders. The con-clusion she drew was that behavioural analysis could be used to screen suspects, and a tool to effectively distribute resources during a case by prioritising the offenders [8].

Other researchers studied the utility of MO to have a better understanding of MO helpfulness in behavioural crime analysis. Qi et al. were a few of them. Their goals were to incorporate MO in the crime analysing process to detect serial crime [27]. The method included a real-world robbery dataset, which after MO extraction through information processing, they analysed the MO characteristic in each robbery with five different classification machine learning algorithms. The findings of their experiment were, the incorporation of MO information in detecting serial crime led to better results in linking various crimes together, separating serial crimes and non-serial crimes [27].

(14)

6 Chapter 1. Introduction Canter David explained consistency in his article. The consistency part of MO was a complex one, due to the weaknesses in the sources of data [6]. Another critical challenge in determining consistency was that the activity of the offenders would have some variation and changes as the person develops. In the same study, Canter David raised the point of distinctiveness in MO that used to differentiate the criminals from each other. Canter David pointed out that criminals share many aspects of their styles, but there will be some aspects that are more distinctive for each of them. Those aspects may provide a basis for distinguishing between offences and offenders [6].

A study by Woodhams et al. also displayed the problem with consistency and distinctiveness in crime linkage. They stated that many of these researches have limitations [11]. One of the flaws was convicted, and solved crimes series in these researches were exclusively sampled, while actual case linkage performed in unsolved crimes [11]. Woodhams et al. also raised their concerns about the effectiveness of previous studies were inflated because of exclusive sampling. The goal of Woodhams et al. research was testing the assumptions that the crime linkage studies also cover unsolved crimes. The way they achieved their goals was by testing case linkage by using two different sets of data, one unsolved crime and the other with DNA matching instead of MO. By doing this, they were able to confirm that the assumption that presented in the previous crime linkage researches were correct, there MO in unsolved crimes that were linked together had more similar MO behaviour than those who did not link together [11].

Others than Woodhams et al. had committed studies to see if consistency and dis-tinctiveness exist in the criminal linkage. In Bouhana et al. recent research in 2016, they covered the same topic [21]. Their purpose was to investigate two assumptions in behavioural crime linkage; one, offenders are somewhat consistent across their crimes, and two, each offender behaviour are distinctive enough to differentiate them from others [21]. The method, Bouhana et al. used was analysing a series of crimes to determine if the offenders in their series show any consistency or distinctiveness. The results from their research also confirmed that offenders showed consistency be-tween their crimes, and few of them were more distinctive than others; however, they all indicated some level of distinctiveness [21].

1.1.1 Research gap

(15)

1.2. Aim & Objectives & Expected outcome 7

1.2 Aim & Objectives & Expected outcome

This research aims to evaluate the predictive performance of a proof of concept system for suspect retrieval based on the statistical analysis approach described by Yokota and Watanabe in combination with state of the art the Spatial-Temporal method. There this project is using an anonymised burglary dataset. The aim for evaluation of the predictive performance is to evaluate how different aspect of dataset affect the predictive performance such as detail and quantity of crime scene MO, the number of the offender and their quantity of offences in the dataset, the consistency and distinctiveness between the different offender. Last, we are also aiming for a comparison with other methods. The objectives are then following to reach the mentioned aims:

• Implement Yokota and Watanabe method and combine it with the Spatial-Temporal method and modify the implementation to be more fitting to anonymised setting.

• Implement Spatial-Temporal and Random Chance methods for comparisons of means.

• Setup experiment with correct metrics and statistical tests.

• Transforming the dataset into a different set to satisfy the detail and quantity, the number of offenders and their quantity of offences, and last the consistency and distinctiveness aspect.

• Evaluate the predictive performance with the dataset that transformed for different aspects.

The expects outcome of this research is a proof of concept system for ranking and prioritising offenders; in other words, a suspect retrieval system for further investi-gation based on different MO and behavioural analysis. The outcome also includes evaluation of varying aspects mentioned and comparison between different methods.

1.3 Research questions

Research questions that this work will answer are:

• RQ1 - How does the quantity of the categories and the specific MO-category affect the accuracy in offender ranking?

• RQ2 - How do the number of offenders and their number of known offences in the dataset affect the accuracy of ranking known offenders?

• RQ3 - How do the offender’s consistency and distinctiveness of their MO be-tween crimes affect the accuracy of offender ranking?

(16)

8 Chapter 1. Introduction The aim is to analyse how different aspect affecting the outcome of the method, and how does Yokota and Watanabe’s method combine with the Spatial-Temporal method performs comparing to other means. An explanation for how RQs will satisfy the aim follows as:

• RQ1 is helping us to understand if the quantity of the MO-categories and the specific MO-categories are affecting offender ranking.

• RQ2 is helping us to understand if the number of offences for each offender affecting the outcome of the method.

• RQ3 is focusing on if consistency and distinctiveness between the data affect the result of the method.

• RQ4 is concentrating on the comparison between different methods.

1.4 Method

We will set up an experimental environment to present and evaluate the methods and the system. We are using the experimenting approach because it has a stable presence and is a steady method that many scholars in this field are using [26]. The experiment approach includes presenting a problem/research questions and validat-ing it by generatvalidat-ing different output to answer the given problem/research questions. The experiments consist of multiple level independent variables and dependent vari-ables. All the experiments also execute in a controlled environment, where we can control the independent variables, and making sure the outside factors do not affect the dependent variables.

We will be able to answer the RQs that we are setting by executing the exper-iments. For RQ3, we will not do any test, but instead, we are discussing it in the discussion section of the thesis. The reason is consistency and distinctiveness aspect between each offender are already incorporated in the calculation by the Random Choice method.

1.5 Scope

(17)

1.6. Relevance 9

1.6 Relevance

This research will provide a significant benefit to different social aspects. Firstly if this research successfully develops a proof of concept method and system that out-performs current existing methods, the investigator or police, can utilise this method to produce a more accurate ranking system to identify the offenders. Secondly, this method also allows new thinking and another way to improve the existing method, and if this is successful, future works can try to combine different existing methods to achieve better performance.

1.7 Structure of the thesis

Description of the structure of the thesis:

• Introduction - presenting the research problem, similar researches and related work, following with aims, objectives and the expected outcome of this research. After are the research questions formed from the objectives and methodology used to help to answer the research questions. Last is the scope of this research. • Background - presenting concepts and knowledge required in the selected field

that is needed to replicate this research.

• Method - presenting a more detailed version of the methodology mentioned in the introduction section, with proper description and explanation. Parts of the method includes detail such as dataset, different methods, and metrics. • Results and Analysis - present results and objective analysis of the results,

to answer the research questions.

• Discussion - presenting subjective discussions of the results and relating to objectives and expected outcomes.

(18)

(19)

Chapter 2 Background

2.1 Modus operandi (MO)

MO can be described as the offender’s pattern of behaviours to commit a crime [2]. The pattern of behaviours extends to before, during and after a crime, in short MO can be explained as an offender habit, and techniques [2]. An offender MO is often the core of a crime, and one cannot reconstruct the event of a crime without understanding the offender MO [2].

Behavioural Crime Linkage (BCL) is a behavioural analysis that can identify the similarity of different crimes and linking them together. The behavioural or MO across various incident is collected and combined, to allow the investigator to work more efficiently and thus minimise the time and resources required for an investiga-tion [1].

Consistency and distinctiveness are two aspects when we are talking about MO, where consistency refers to how offenders behave similarly over different crime. While distinctiveness, on the other hand, refers to the offender’s strategy that is unique and departs from other offenders’ approaches/strategies [3]. The consistency and distinc-tiveness have a direct connection to our dataset. Where, consistency in MO allows linking different crimes to the same offenders, and distinctiveness in MO enables separation of different offenders.

The quality of the crime scene MO defined as how different MO-attributes is collected, that means with a low grade of errors and a small degree of missing values. This quality is also determined by how different distinctive MO-attributes are in differentiating the offenders and how consistence is MO-attributes between various crimes related to the same offenders, as explained above.

2.2 Choice probabilities method - Random Choice

The suspect retrieval system is built based on the Random Choice (RC) method Yokota and Watanabe used in their research [14]. There the system, calculate how likely that an offender commits this crime based on their known MO.

For each known offender in the database, the offenders displayed as Si(i =

1, 2, 3, ..., I), and the probability of them committing the new crime is shown as P (R|Si); where R is a new crime that is currently in question.

The process of calculating probability P (R|Si) are as follows. The MO that

connected to each offender separated into two-level, the upper level is the categories and lower level consist of different items in those categories. The category explained

(20)

12 Chapter 2. Background as (e.g. type of residential area), and the items in that category can be urban, or countryside. A set of multiple categories shown as y. There y is described as:

y = [Category1...Categoryj...CategoryJ]

y = [|y11, ..., y1m1|...|yj1, ..., yjmj|...|yJ 1, ..., yJ mJ|] (yjk = 0, 1) (2.1)

With this equation, yjkis the kth item in Categoryj, there (k = 1, 2, 3, ..., mj), (j =

1, 2, 3, ..., J ), mj is the number of items in Categoryj. Items yjk in each Categoryj

are binary if they have been chosen during a crime 1 is assigned; otherwise, 0 is assigned. The probability that Si chooses yjk during a crime is shown as θijk, there

the sum of θijk is equal to one.

mj

X

k=1

θijk = 1 (2.2)

For each offender, the independent multinomial distribution was assumed when the offender chose a set of MOs (y), this means that the probability that an offender chose a set of MOs does not affect how the offender chose other sets. Therefore, the probability of an offender Si using yj, set of items in Categoryj; yj = [yj1, ..., yjmj]

can be explained as:

P (yj|Si) = Cj mj Y k=1 θyjk ijk (2.3)

In this Equation (2.3), Cj is a weight for the specific Categoryj, which are

consis-tent across all offender (S1, S2, S3, ..., SI). The yjk in the same equation is the items

in the reference crime R that the police are currently investigating. The probability that Si chose a set of MO items of all categories can then be displayed as:

P (y|Si) = C j=1 Y J mj Y k=1 θyjk ijk (2.4)

Once again, C, in this case, is the product of different constant Cj across different

categories (C = QjCj). P (y|Si) is the probability that showcases how similar the

crime under investigation is to previously known sets of MOs, Si; while P (R|Si) =

P (y|Si).

The calculation of θijk can be done by using the Random Choice method that

also being mentioned in the research of Yokota and Watanabe [14]:

θijk = (fijk+ 1/mj)/(fij + 1) (2.5)

In this equation, fijk is the amount of time that Si chose yjk in the previously

known crime that committed by Si, while fij, on the other hand, is the amount

of previously known crime that committed by Si. In the numerator (1/mj) is the

Random Choice probability, and +1 in the denominator is to prevent division with zero, in cases, Si does not have any previously known crime.

Other than the Random Choice method Yokota and Watanabe also proposed two other methods for calculating θijk However, they were not used, due to Random

(21)

2.3. Spatial-Temporal method 13 The constant method:

θijk= fijk/fij (fijk > 0) (2.6)

θijk = 0.0001/nj (fijk = 0) (2.7)

The kernel method:

θijk = (fijk∗ λj + (fij − fijk)(1 − λj)/(mj − 1))/fij (1/mj ≤ λj < 1) (2.8)

2.3 Spatial-Temporal method

Spatial-Temporal (SpaT emp) methods have been used often in BCL to link different crimes and offenders [20]. The spatial behaviour is described as the distance between different crimes, while temporal behaviour is described as the time it took between different crimes. These behaviours can indicate how scattering an offender is in term of space and time. The crimes committed by the same offenders are more likely to connect than crimes that are not linked together [20].

In this experiment, the method for calculating nearby crimes are kept simple. There a distance equation in 3-dimension is utilised, to keep the implementation of this method simple and easy to understand, with existing attributes like longitude, latitude, and date the approach is more than possible to be implemented and use.

d(R1, R2) =

p

(x2− x1)2+ (y2− y1)2+ (z2 − z1)2 (2.9)

The equation calculates the distance of two different crimes in term of space and time. With R1 and R2 being the two separate crimes in question, each of them will

have attributes like longitude as x, latitude as y and time as z. Since the dataset is anonymous, the longitude and latitude of each crime are offset by a certain margin, while different in the time are in days. The behaviour of the function can be described as follows:

The distance is calculated between the new crime and those crimes that exist in the database. Crimes with the closest distance sort to the start of the ranking, and later 5, 10 or any given amount of offenders that are most likely to commit the crime are selected base on the closeness of the Spatial-Temporal distance.

2.4 Combining Random Choice with Spatial-Temporal

method

(22)

14 Chapter 2. Background

Rank Offender nDCG values

1 A 1.00

2 B 0.95

3 C 0.90

4 D 0.85

5 E 0.80

Table 2.1: Example of ranking list for Random Choice method

The Spatial-Temporal method is also generating a similar ranking for the nearest crime in terms of space and time; there, the closest crime is rank highest. An example of a Spatial-Temporal ranking base on the new crime R is in Table 2.2.

Rank Offender.Crime Distance

1 C.1 0.50

2 A.1 1.00

3 C.2 2.00

4 B.1 2.50

5 E.1 4.00

Table 2.2: Example of ranking list for Spatial-Temporal method

With the spatial-temporal ranking showcased in Table 2.2, we will calculate the DCG values for each offender based on the spatial-temporal ranking. The example of the nDCG ranking list is in Table 2.3.

Rank Offender nDCG values

1 C 0.95

2 A 0.90

3 B 0.80

4 E 0.70

5 D 0.00

Table 2.3: Example of Spatial-Temporal ranking list based on nDCG values Random Choice with Spatial-Temporal method is a summary of Table 2.1 and 2.3, which combine both Random Choice and Spatial-Temporal. This combination produces a ranking list that intends to improve the existing Random Choice method. As we can see in Table 2.4, this example of Random Choice and Spatial-Temporal method offender A is most likely to commit the new crime R; and the ranking of B, C, D and E are different from the first Random Choice ranking. The method also includes a weighting option there 0 ≤ w1 ≤ 1 and w2 = 1 − w1, in this example, the

weighting is w1 = 0.5 , w2 = 0.5.

As we can see in Table 2.4, this example of Random Choice and Spatial-Temporal method offender A is most likely to commit the new crime X; and the ranking of B, C, D and E are different from the first Random Choice ranking. The method also includes a weighting option there 0 ≤ w1 ≤ 1 and w2 = 1 − w1, in this examples the

(23)

2.5. Baseline method - Pure Chance 15

Rank Offender RC SpaTemp RC + SpaTemp

1 A 1.00 0.90 w11 + w20.9 = 0.950

2 C 0.90 0.95 w10.90 + w20.95 = 0.925

3 B 0.95 0.80 w10.95 + w20.8 = 0.875

4 E 0.80 0.70 w10.8 + w20.7 = 0.750

5 D 0.85 0.00 w10.85 + w20 = 0.425

Table 2.4: Example of ranking list for Random Choice and Spatial-Temporal method nDCG scores

2.5 Baseline method - Pure Chance

The Pure Chance (P C) method is used here to establish a baseline for comparison with other methods, in this case, Random Choice, Spatial-Temporal, and Random Choice with Spatial-Temporal method. The use of Pure Chance can tell a lot about the results since it can determine how inefficient are the Pure Chance methods in the suspect retrieval problems. The implementation of the Pure Chance method depends on which type of experiment is in question; there does not exist one size fit all kind of function or equation. For this specific experiment, the Pure Chance method will be implemented as follows:

(24)

(25)

Chapter 3 Method

The Method section is going to explain more in detail how each experiment is setup. We are following the experimenting approach because it is a stable method that uses widely in this field [26]. The experiment method, including presenting problem/re-search questions and setting up goals to be validated by generating different results that are answers to the problem. The details of our experiments will be presented below. While RQ1, RQ2 and RQ4, are answered by using experiment method. RQ3 will not be tested by using an experiment. Instead, it will be explained in the dis-cussion section, due to consistency and distinctiveness already incorporate in the calculation of Random Choice with Spatial-Temporal method, and does not need a separate experiment to be answering RQ3.

3.1 Dataset

The dataset used in this experiment consist of 24 categories, and those categories included different types of features. The dataset is anonymised, and no data con-tained any personally identifying information, i.e. no names, phone numbers or any other personally identifiable information was included in the dataset. Further, both longitude and latitude only include a general area of 1.24 square kilometres km2. The

Spatial-Temporal information was an estimation of the actual location and time. In this study, only 92 out of 152 crimes were used, because only offenders with more than two offences were relevant. Table 3.2 display the first part of the features of crimes and Spatial-Temporal information.

Part of the dataset consisted of binary features; there they were assigned 1 if they were present at the crime scene and 0 otherwise. Those various features were also divided into different categories, as shown in Table 3.1.

3.2 Selecting crimes in the dataset

By the nature of Random Choice with Spatial-Temporal method, it can only work with the pre-existing offenders. That means the only offenders with two or more crimes were relevant for Random Choice with Spatial-Temporal method in these experiments. With the selection, 92 crimes selected from the database and used for all the tests in this research. In the experiments, we need to connect two or more crimes to investigate if our method can link different known crimes to correct offenders base on the offenders one or more historical crimes. The experiments are

(26)

18 Chapter 3. Method

Categories Features

1: Complete Crime completed Crime attempt

2: Suspect Exist Yes No

3: Residential Area 1 Urban Rural

4: Residential Area 2 _{House in corner}No neighbour One neighbour_{House in woods}Multi neighbour 5: Dwelling 1 High standard Normal standard Low standard 6: Dwelling 2 _{Owned apartment}Villa Farm _{Rental apartment}Townhouse 7: Dwelling 3 _{Apartment at top}Multi level _{Apartment at bottom}Single level

8: Alarm Activated_Disabled Triggered _{No alarm} Sabotaged 9: Standard Objects Vehicle on driveway Grass or Snow maintenanceMail emptied Interior lighting Exterior lightingDog or sign Active in neighbour-watchStreet lighting

Sign alarm No information 10: Plaintiffs 1 Owned business Registered company

11: Plaintiffs 2 Planned absence Spontaneous absence Home during crime 12: Plaintiffs 3 Household serviceKids at home Announced tradeHome visit Vehicle at airportUnknown call Documented absenceCripple elderly

Previous break in No information Not contact able 13: Entrance 1 Entrance cover Entrance not cover

14: Entrance 2 Basement Ground level Above ground level No information 15: Entrance 3 Break in tool from place Escape route prepared 16: Entrance 4 _UnlockedDrills _{Open for ventilation}Breaks Illegal key_Other Breaks window in_{No information} 17: Entrance 5 Patio doorDoor Mirrored patio doorWindow Triple panel windowBalcony door Cellar doorMail slot

Other No information 18: Search Carefully search Messy Big mess 19: Goods 1 Bulky goods None bulky goods

20: Good 2

Credit or debit card Cellphone Alcohol or tobacco Electronics Gold and jewellery Cash Clothes Medicine

Toys Weapons Safe Perfume

Vehicle keys Passport and id Art Furniture Other No information

21: Trace 1 FingerprintTires Visible fibreDNA Compressed glassShoes Search for goodsGloves Tool mark No BPU ordered No information 22: Trace 2 Small mark Medium mark Large mark 23: Trace 3 Less than equal 5 marks Greater than equal 6 marks

24: Others _{DNA marked}Witness Tips _{No information}Search able goods

Table 3.1: Binary features and Categories

Feature Explaination

id Crime identifier in the database

offender Anonymised number for unique offenders

idval Crime identification code related to a specific offender datestart Crime’s start date

dateend Crime’s end date timestart Crime’s start time timeend Crime’s end time

longitude Crime’s longitude coordinate (anonymized to 1.24km2₎

latitude Crime’s latitude coordinate (anonymized to 1.24km2)

(27)

3.2. Selecting crimes in the dataset 19 not possible if the offender has only one crime since we need one as the crime to be linked, and one or multiple to be the references for the method. This problem does not exist if the method is used in real life outside of the experiments, since the police or investigator only need one or more crime as references, while the crime to be linked is the new unknown crime that will be investigated.

3.2.1 Setting up the crimes

To run the experiments, we were setting up a database with the chosen crimes. The division between the test crime and crimes that exist in the database is 1 to 91; this means that we used one crime as a test crime and the rest 91 crimes were crimes that exist in the database. This way of dividing was possible because, by choosing crimes from offender with two or more crimes, we were sure that when using any crimes as a test crime, there exists at least one reference crime in the database.

ID Offender Crime MO-Categories

1 A 1 ... 2 A 2 ... 3 B 1 ... 4 B 2 ... 5 C 1 ... 6 C 2 ... 7 C 3 ...

Table 3.3: Example of chosen Crimes

Table 3.3 is an example of the chosen crimes from the original dataset; in this table, all offenders have two or more crimes related to them. For the first run, we chose one crime as a test crime and removed it from the database while the rest were left behind to be use in Random Choice with Spatial-Temporal method.

ID Offender Crime MO-Categories Database 2 A 2 ... 3 B 1 ... 4 B 2 ... 5 C 1 ... 6 C 2 ... 7 C 3 ... Test Crime 1 A 1 ...

Table 3.4: Example of how crime is divided for the first run

(28)

20 Chapter 3. Method

ID Offender Crime MO-Categories Database 1 A 1 ... 3 B 1 ... 4 B 2 ... 5 C 1 ... 6 C 2 ... 7 C 3 ... Test Crime 2 A 2 ...

Table 3.5: Example of how crime is divided for the second run

Table 3.5 is displaying the next step of the calculation; the steps are repeated for all crimes. In our experiments, the number of tests was 92 for experiment 1, 2 and 4, because there exist 92 crimes in the database.

3.2.2 Setting up amount of cases

Experiment 3 had a different amount of test cases. There we generated different amount of crime combination for each combination of 1 case, 2 cases, 3 cases, ..., N cases in the database. Table 3.6 shows an example of how the actual offenders in the database were chosen; there some had two crimes; others had three or more. Table 3.7 shows how different combinations were generated from actual offenders.

ID Offender Crime MO-Categories

1 A 1 ... 2 A 2 ... 3 B 1 ... 4 B 2 ... 5 B 3 ... 6 C 1 ... 7 C 2 ... 8 C 3 ... 9 C 4 ...

Table 3.6: Example of chosen Crimes

Combinations Of Crime Combination

2 _{(B2,B3), (C1, C2), (C1, C3),...}(A1, A2), (B1, B2) , (B1, B3), 3 (B1, B2, B3), (C1, C2, C3), (C1, C2, C4),_{(C1, C3, C4), (C2, C3, C4)}

4 (C1, C2, C3, C4)

(29)

3.3. Metrics 21 ID Offender Crime MO-Categories

Database 1 A 1 ... 2 A 2 ... 4 B 2 ... 6 C 1 ... 7 C 2 ... 8 C 3 ... 9 C 4 ... Test Crime 3 B 1 ...

Table 3.8: Example of how different combination are used in testing ID Offender Crime MO-Categories

Database 1 A 1 ... 2 A 2 ... 3 B 1 ... 6 C 1 ... 7 C 2 ... 8 C 3 ... 9 C 4 ... Test Crime 4 B 2 ...

Table 3.9: Example of how different combination are used in testing

Following Table 3.7, we selected each crime combination for testing and removed the remaining crime related to our offender from the database. For example, if (B1, B2) were tested, B3 was now removed from the database since we were only testing what happens if there was only one reference crime in the database. See the example of how the crimes will look like in testing, in Table 3.8 and 3.9. This way of testing was replicated for each crime combination in different combinations of a specific number of crimes. The reason for generating different crime combinations was because our original set of cases were limited by the number of crimes in the database. By doing this, we generated more combinations of crimes than the original 92 cases. Later to randomly chose 150 crimes combinations to do the testing.

3.3 Metrics

In these experiments, the focus was on answering the research questions with nor-malised Discounted Cumulative Gain (nDCG) as the primary metrics. Secondary metrics such as Binary Preferences (Bpref), R-precision (P@R), Recall at R (R@R), F-score (F1) were calculated without making any comparison. The reason was that

(30)

22 Chapter 3. Method ranking, while some of them did not take into consideration the ranking in their calculation. The explanation for the metrics is as follows:

In information retrieval, nDCG is a useful measurement that can be used to compare the performing of different retrieval methods. It takes into consideration the relevant cases, and position in the ranking when making a calculation. The advantage of using nDCG are [13]:

• Highly relevant documents considered to be are more valuable than non-relevant documents.

• The further down in the ranking a highly relevant document is listed, the less valuable it is, because it is less visible in the ranking for the users.

The traditional equation for calculating the nDCG are the following:

nDCG = DCG IDCG (3.1) DCG = p X i=1 reli log₂(i + 1) (3.2)

IDCG is the most ideal DCG for the current retrieval; where the highest relevant documents are rank highest:

IDCG = |RELp| X i=1 reli log₂(i + 1) (3.3)

There RELp is an optimal ranking list for the current retrieval. At the same time,

reli is the score of the current document i. In our research, the relevant documents

are given a score of 2, while none relevant documents are given a score of 0. The IDCG is used to normalize the DCG value into a range between 0 − 1.

Bpref is also a similar measurement in information retrieval, there the measure-ment taking into consideration of how many non-relevant documeasure-ments are rank before relevant documents. The disadvantage with Bpref is that it performs significantly better if there are more than two related documents for each subject in the dataset [5].

Bpref = 1 R

X

r

1 −|n ranked higher than r|

R (3.4)

There R is the number of relevant documents; n is the number of non-relevant documents that are rank before r current relevant document.

Precision at R (P@R), is a precision measurement that is only focusing at a certain level of retrieval. Which differ from the classic precision that is focusing on all results, that may not be relevant. The level of retrieval here is R, which is the total number of relevant documents.

(31)

3.4. Statistical test 23 Here, r is relevant documents in the ranking, and R is the total number of relevant documents.

Recall at R (R@R) is calculating the effectiveness of retrieving right offender, or the fraction of pertinent offender that are correct [10, 17]. The recall calculates as:

R@R = |relevant documents| ∩ |retrieved documents|

|relevant documents| (3.6)

F-score or F-measure is a measurement that calculating the test’s accuracy by using both precision and recall, which are the relation between the correct offender and correct retrieved offender by different methods [10, 17]. F-measure calculated as:

F1 =

2P R

P + R (3.7)

P and R in the calculation of F1 is the P @R and R@R value.

3.3.1 Choosing metrics

For this research, the primary metrics for all the experiments were Normalised Dis-counted Cumulative gain (nDCG), and it was used to comparing different dependent variables. The reason for not using the rest of the metrics even when they were calcu-lated was because R-Precision (P@R) and Binary Preference (Bpref) perform better when there are more than two relevant documents that exist for each ranking. In the case of our ranking system, the crime was directly related to one offender, there the related documents for the classification were just one per test crime, which reduced the benefits of P@R and Bpref significantly, even if they took into consideration of the ranks in the ranking. The rest of the metrics Recall at R (R@R), and F-measure (F1) were included due to their frequent use in information retrieval. The nDCG

was the primary metric, and the other metrics were not used in the comparison in the experiments. However, they were calculated because it provides metrics for com-parison if other research wants to use any of the metrics when comparing with the research’s numbers.

3.4 Statistical test

3.4.1 Friedman test

A Friedman test can be used when there is a need for comparison over k algorithms and n tests. There exists a ranking between k algorithms with rank one being the best performance, and rank k is the worst performance [10]. The Friedman test has two underlying assumptions [7]:

• Results of each row are independent of each other, i.e. the result from the first test does not affect the result of the second test and so on.

(32)

24 Chapter 3. Method The Friedman test statistic can be calculated by using the following Equation 3.8 [7]: T1 = 12 nK(K + 1) K X k=1 R2_k− 3n(K + 1) (3.8)

In Equation 3.8, n is the number of tests performed, K is the number of algo-rithms, and Rk is the sum of the rank of all test for the k-th algorithm [7].

After calculating T1 the Friedman test is compared with a critical value cv of k

and n at a level of chosen α, after that, the statistic can help to accept or dismiss the following hypothesis [10]:

• (T1 < cv) H0: there exists no significant difference between the algorithms

• (T1 ≥ cv) H1: there exists a significant difference between the algorithms

3.4.2 Nemenyi test

In cases where it exists a significant difference between the algorithms, a posthoc test such as the Nemenyi test can be used to determine where between the algorithms that significant differences are [10]. The critical difference CD can be calculated as [10]:

CD = qα

r

k(k + 1)

6n (3.9)

In Equation 3.9, qα are dependent on the significant level α as well as k. The

difference can be determined by comparing with CD [10], here Ri and Rj are the

average rankings for all test of algorithm i and j:

• (|Ri − Rj| < CD) there exist no critical difference between the algorithms

(Ri, Rj)

• (|Ri − Rj| ≥ CD) there exist a critical difference between the algorithms

(Ri, Rj)

3.4.3 Choosing parameters for the statistical test

In this research, the statistical test only performed in Experiment 4, when we focused on comparing different methods against each other. In Experiment 4, we had four different methods, and the number of tests was 92, which make our k = 4 and n = 92. This gave us a degree of freedom d.f = k − 1 = 3, and an alpha level α = 0.05, used the Chi-square Distribution Table A.1, we had a critical value of cv = 7.81.

Given the α = 0.05 and k = 4, the qα = 2.569 for the Nemenyi test as the

Table A.2 show. With the Equation 3.9 and the given values, our critical difference CD = 0.489.

(33)

3.5. Experiment 1: Analysis the specific MO-categories 25 Variables Values n 92 k 4 d.f 3 α 0.05 qα 2.569 critical values cv 7.81 critical difference CD 0.489

Table 3.10: Summary of the chosen parameters for statistical analysis

3.5 Experiment 1: Analysis the specific MO-categories

In this experiment, we were trying to identify how each category in the MO, affect the results. The goal of this experiment was to answer the RQ1 by testing different type of MO-categories and how they affect the accuracy of the method. Experiment 2 was also dependent on experiment 1 to find which categories will be used to combine and compare. The method for choosing the categories included categories with the highest median nDCG in experiment 1. The reason for not chosen all categories lies in the fact that the computing time for combinations was not realistic enough.

The different MO categories represented the independent variables used in this experiment. It has 25 number of different levels, one for each category. The dif-ferent categories consisted of Spatial-Temporal (SpaT em) was a baseline category. Category N (CatN) were the categories that we were testing, for example, Category 1 Cat1 may include tools used (Hammer, Drill, Screwdriver), and Category 2 Cat2 may include stolen goods (Money, Hardware, Jewellery). More accurate explanation of the contents of in each category is in Table 3.1.

(SpaT em), (SpaT em + Cat1), (SpatT em + Cat2), (SpaT em + Cat3), ..., (SpaT em + CatN )

For the dependent variables, we had the metrics like, normalize Discounted cu-mulative gain (nDCG), Binary Preferences (Bpref), R-Precision (P@R), Recall at R (R@R), F-measure (F1), the explanation for the metrics is in Section 3.3.

3.6 Experiment 2: Identifying the preferred

combi-nation of MO-categories

In this experiment, the goals were to find the preferred combination of MO-categories and to answer the quantity part of RQ1. The focus was on how different quantity of MO-categories affects the results of the method. When the preferable combination has chosen, it used in Experiment 3 and 4. Same as the first experiment, the com-bination with the highest median is chosen, while it is also required to have a high third and first quartile.

(34)

26 Chapter 3. Method combination. The different combinations consisted of Spatial-Temporal (SpaT em) was baseline combination, Category N (CatN) were the categories that we were testing. Examples of the combinations are shown as:

(SpaT em), (SpaT em + Cat1 + Cat2), (SpatT em + Cat2 + Cat3), ..., (SpaT em + Cat1 + Cat2 + Cat3), (SpaT em + Cat1 + ... + CatN )

For the dependent variables, we had the metrics nDCG, Bpref, P@R, R@R and F1.

3.7 Experiment 3: Identifying the amount of cases

related to each offender

The focus of this experiment was the number of existing cases in the database related to each offender in question. By having this experiment, we could understand if a different number of existing cases affect the results from our Random Choice with Spatial-Temporal method. This experiment was also for answering the RQ3 about the number of existing cases in the database.

The independent variables were the number of reference cases that exist in the database for a given offender. For this experiment, there exist ten different levels, for one to ten reference cases.

(1 case), (2 cases), (3 cases), (4 cases), (5 cases) (6 cases), (7 cases), (8 cases), (9 cases), (10 cases)

3.8 Experiment 4: Identifying the preferred weight

for Random Choice with Spatial-Temporal method

Before running Experiment 4.1, we were making another test to find the preferred weighting option for the Random Choice with Spatial-Temporal method. This ex-periment told us the weighting that gives the best result before running exex-periment 4.1, that compared different methods against each other. To identify the preferred weight, we were chosen the weights that had the highest median, while the first and third quartile also remains the highest.

In experiment 4, the independent variables were the different weightings for the Random Choice with Spatial-Temporal method, we had a total of 21 different levels, with w1 starting at 0 and increment with 0.05 till w1 is at 1. More detail explanation

for these weightings, see Section 2.4.

0 ≤ w1 ≤ 1

w2 = 1 − w1

(35)

3.9. Experiment 4.1: Identifying the methods performances 27

3.9 Experiment 4.1: Identifying the methods

per-formances

By making a comparison between the different methods, we were able to answer the RQ4, that was focusing on the accuracy of each method itself. This comparison allowed us to see if our improved method were better than Spatial-Temporal method and Random Choice method. To make this comparison, we were utilizing the sta-tistical Friedman test and Nemenyi test to determine if there exists a significant difference between the methods and between which methods it exists.

With the focus of making a comparison between different methods, the indepen-dent variables in this experiment were the type of methods. The total number of different levels were four, with Random Choice, Spatial-Temporal, Random Choice with Spatial-Temporal and Pure Chance method.

RC, SpaT emp, RC + SpaT emp, P C

3.10 Hardware requirement

The hardware that was used in this experiment are listed as following: • Intel Core i7, 8th Gen

• NVIDIA GeForce GTX 1060 • Windows 10 Pro

• 16GB RAM

With those specifications, the longest time for an experiment is around 1 hour and 40 minutes. The run time may depend on other factors that do not necessarily be hardware, for example, programming language and database.

List of software used in the experiments: • Python 3.8

• MySQL 8.0.20

(36)

(37)

Chapter 4 Results and Analysis

This chapter will display results from the experiments and analysing them with the help of our primary metric.

4.1 Experiment 1: Analysis the specific MO-categories

Figure 4.1 display the nDCG results from experiment 1 with the comparison of different categories with the SpaTemp method as a baseline. Here the higher nDCG value showed higher accuracy in offender ranking.

Categories nDCG Bpref P@R R@R F1 SpaTemp 0.8682 0.6766 0.7064 0.7366 0.7196 SpaTemp + Cat 21 0.8146 0.5978 0.5978 0.5978 0.5978 SpaTemp + Cat 3 0.7779 0.5000 0.5000 0.5000 0.5000 SpaTemp + Cat 16 0.7753 0.5543 0.5543 0.5543 0.5543 SpaTemp + Cat 6 0.7614 0.5000 0.5000 0.5000 0.5000 SpaTemp + Cat 5 0.7503 0.4457 0.4457 0.4457 0.4457 SpaTemp + Cat 7 0.7465 0.5000 0.5000 0.5000 0.5000 SpaTemp + Cat 8 0.7169 0.4130 0.4130 0.4130 0.4130

Table 4.1: Results from experiment 1 displaying average results for all the metrics The nDCG boxplot, from Figure 4.1 showcase that Categories 3, 5, 6, 7, 8, 16 and 21 with our method had the highest median nDCG out of all Categories with Spatial-Temporal method; thus, those categories were included in the next experiment. Table 4.1 is displaying average values for different metrics for the seven categories that were included in the following experiment, the average nDCG in Table 4.1 also show similar results there Spatial-Temporal results was the best when only single category are included.

(38)

(39)

4.2. Experiment 2: Identifying the preferred combination of MO-categories 31

4.2 Experiment 2: Identifying the preferred

combi-nation of MO-categories

With the different combinations of categories 3, 5, 6, 7, 8, 16 and 21, the boxplot in Figure 4.2 display 24 combinations with the highest median nDCG. Also, here the higher nDCG value showed higher accuracy in the offender ranking.

Categories nDCG Bpref P@R R@R F1 Spt-Tem + Cat 6,7,16,21 0.8709 0.7283 0.7283 0.7283 0.7283 Spt-Tem 0.8682 0.6766 0.7064 0.7366 0.7196 Spt-Tem + Cat 3,6,7,8,16,21 0.8605 0.7174 0.7174 0.7174 0.7174 Spt-Tem + Cat 3,5,6,7,16 0.8597 0.7174 0.7174 0.7174 0.7174 Spt-Tem + Cat 5,6,7,8,16,21 0.8577 0.7174 0.7174 0.7174 0.7174 Spt-Tem + Cat 3,6,7,8,21 0.8561 0.6957 0.6957 0.6957 0.6957 Spt-Tem + Cat 3,5,6,7,8 0.8541 0.6957 0.6957 0.6957 0.6957 Spt-Tem + Cat 5,6,7,16 0.8505 0.6957 0.6957 0.6957 0.6957 Spt-Tem + Cat 6,7,16 0.8416 0.6739 0.6739 0.6739 0.6739 Spt-Tem + Cat 3,6,7,16 0.8383 0.6630 0.6630 0.6630 0.6630 Table 4.2: Results from experiment 2 displaying average results for all the metrics

(40)

(41)

4.3. Experiment 3: Identifying the amount of cases related to each offender 33

4.3 Experiment 3: Identifying the amount of cases

related to each offender

In experiment 3, the focus was on identifying if the number of cases related to an offender affects the ranking results. Here the chosen examples are 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10 cases; the higher the nDCG values, the better were the performances.

The Offenders column show the number of offenders that have the amount of reference crime in the database more than the number of cases in Cases column. While N is the generated fictive offenders out of the real numbers of offenders, the fictive offenders help to increase the number of cases in the experiment. To better understand how the fictive profiles are generated and chosen, the explanation is, for example, three cases:

For the three cases, we have six actual offenders with three cases or more in the database. From those six offenders, we were able to generate all possible combinations of the fictive profile, there each profile has exactly three cases. When the generation of profiles was done, 150 profiles were randomly chosen to do the experiments.

# Cases # Offenders N nDCG Bpref P@R R@R F1

1 27 150 0.6609 0.3709 0.3709 0.3709 0.3709 2 11 150 0.7992 0.5762 0.5762 0.5762 0.5762 3 6 150 0.8867 0.7417 0.7417 0.7417 0.7417 4 4 150 0.9069 0.7616 0.7616 0.7616 0.7616 5 2 150 0.9266 0.8079 0.8079 0.8079 0.8079 6 2 150 0.9331 0.8212 0.8212 0.8212 0.8212 7 2 150 0.9253 0.8013 0.8013 0.8013 0.8013 8 2 150 0.9438 0.8609 0.8609 0.8609 0.8609 9 2 150 0.9445 0.8543 0.8543 0.8543 0.8543 10 2 150 0.9522 0.8742 0.8742 0.8742 0.8742

(42)

(43)

(44)

36 Chapter 4. Results and Analysis

4.4 Experiment 4: Preferred weight for

Random-Choice with Spatial-Temporal method

In Experiment 4, the focus was on finding the best weighting for the RC + SpaTemp method with different w1 starting from 0.00 and increment with +0.05, and the

corresponding weight is w2 = 1 − w1. Cases nDCG Bpref P@R R@R F1 SpaTemp w1 0.50 / RC w2 0.50 0.8605 0.7174 0.7174 0.7174 0.7174 SpaTemp w1 0.55 / RC w2 0.45 0.8955 0.7935 0.7935 0.7935 0.7935 SpaTemp w1 0.60 / RC w2 0.40 0.9021 0.8043 0.8043 0.8043 0.8043 SpaTemp w1 0.65 / RC w2 0.35 0.9147 0.8370 0.8370 0.8370 0.8370 SpaTemp w1 0.70 / RC w2 0.30 0.9107 0.8261 0.8261 0.8261 0.8261 SpaTemp w1 0.75 / RC w2 0.25 0.9111 0.8261 0.8261 0.8261 0.8261 SpaTemp w1 0.80 / RC w2 0.20 0.9092 0.8261 0.8261 0.8261 0.8261 SpaTemp w1 0.85 / RC w2 0.15 0.9070 0.8261 0.8261 0.8261 0.8261 SpaTemp w1 0.90 / RC w2 0.10 0.9073 0.8261 0.8261 0.8261 0.8261 SpaTemp w1 0.95 / RC w2 0.05 0.9061 0.8261 0.8261 0.8261 0.8261 SpaTemp w1 1.00 / RC w2 0.00 0.8891 0.8261 0.8261 0.8261 0.8261

(45)

(46)

38 Chapter 4. Results and Analysis

4.5 Experiment 4.1: Identifying the methods

per-formances

Experiment 4.1 was comparing the performance of different methods against each other. In this case, the focus was comparing, Spatial-Temporal, Random Choice, Random Choice with Spatial-Temporal with the weighting of SpaT emp : 0.65 / RC : 0.35 and Pure Chance method against each other.

Cases nDCG Bpref P@R R@R F1

RC + SpaTemp 0.9147 0.8370 0.8370 0.8370 0.8370 SpaTemp 0.8673 0.6702 0.6968 0.7274 0.7101

RC 0.5523 0.2609 0.2609 0.2609 0.2609

PC 0.1734 0.0262 0.0200 0.0490 0.0264

Table 4.5: Results from experiment 4.1 displaying average results for all the metrics Experiment 4.1 showed that the Random Choice with Spatial-Temporal method with the weighting of SpaTemp 0.65/RC 0.35 performed best according to the av-erage metrics in Table 4.5. With the boxplot in Figure 4.6, the offender ranking results were more consistent and accurate when using Random Choice with Spatial-Temporal method comparing to other methods with nDCG as the metric.

4.5.1 Statistical analysis

The chosen parameters for the statistical analysis were presented in Table 3.10. After calculating T1 value with the help of the Friedman test, the results for T1 are shown

in Table 4.6. Metrics T1 Value nDCG 181.06 Bpref 142.94 P@R 141.85 R@R 138.21 F1 141.85

Table 4.6: T1 values of all the metrics

If the results of T1 was higher than cv then null hypothesis H0 was dismissed, and

H1 was accepted since there existed a significant difference between the methods; here, all T1 were larger than cv.

Since there existed a significant difference between the methods, the Nemenyi test was performed there critical difference CD calculated with Equation 3.9. The focus was on Random Choice with Spatial-Temporal method for this test.

(47)

4.5. Experiment 4.1: Identifying the methods performances 39

nDCG

Methods comparison Difference in average ranking

RC + SpaTemp vs SpaTemp 1.2500

RC + SpaTemp vs RC 1.7391

RC + SpaTemp vs PC 2.4891

Bpref

P@R

R@R

F1

(48)

(49)

Chapter 5 Discussion

5.1 Experiment 1: Analysis the specific MO-categories

In experiment 1, we found that all categories for themselves perform worse than the Spatial-Temporal method. The results were something we expected as a single category was not distinctive enough for the Random Choice method to differentiate multiple offenders.

From this experiment, we chose Categories 3, 5, 6, 7, 8, 16 and 21 because they were the best performances categories. The reason for those specific seven categories and not any other number of categories was because of the performances of the categories. As we analysed the list of categories, category 8 seems the right place to stop because, at category 2, the information it was providing is not reflected in the offenders’ behaviour. Because of the other categories performed worse than a category that does not reflect the offenders’ behaviour, we decided not to include them either.

To answer the first part of the first research question regarding how different categories affect the accuracy of the ranking. We found that the specific categories do affect the accuracy in offender ranking. The specific categories for themselves were not as accurate since they were not distinctive.

5.2 Experiment 2: Identifying the preferred

combi-nation of MO-categories

After running experiment 2, we found nine combinations of categories 3, 5, 6, 7, 8, 16 and 21 that have a higher median nDCG than the Spatial-Temporal method and all other combinations showed an increase in performance when combined compare to when they were for themselves. The results were also something we expected since combining the categories allowed for a better distinction of offenders.

There exist nine combinations that had their median nDCG higher than the Spatial-Temporal method. The combination Spt-Temp + Cat 3, 6, 7, 8, 16 and 21 performed the best among those combinations according to our criteria, the com-bination has the highest median and third quartile, while the first quartile was in the same range as the Spatial-Temporal method. As it was the combination that fit our criteria, we chose it to be the combination for the Random Choice with Spatial-Temporal method and used it for the rest of the experiments in this research.

(50)

42 Chapter 5. Discussion Regarding the number of combinations displayed on the boxplot, there were only 24 combinations that were displayed out of 120. The reason for this was because we were only interested in those combinations that performed the best and showing all the combinations were unnecessary and was confusing.

For the second part of the first research question, it was an increase in perfor-mance when combining multiple categories. Therefore, the quantity of MO-categories does affect the accuracy of the ranking. Nevertheless, these also depended on the categories themselves; if the combinations have the correct categories its does not need to include more than they need to.

5.3 Experiment 3: Identifying the amount of cases

related to each offender

Results from experiment 3 displayed an increase in accuracy as the related cases of the offenders goes up. We reasoned that these results were a product of consistency; there more crimes that were related to the offenders, the Random Choice method better analysed the probability of those offenders committing the crime in interest. As the Random Choice method performance affected our method, the Random Choice with Spatial-Temporal method also increased accuracy, there any consistent offenders with similar MO as the crime in interest increase in the ranking.

The numbers of offenders with 1, 2, 3, ..., N numbers of cases were limited because we were working with a limited dataset. To increase the number of data points for experiment 3, we generated more fictive offenders’ profile. By doing so, the results did not reflect the actual situation in ordinary cases. Still, a similar trend of the number of cases also displayed in Yokota and Watanabe article, an increase in the number of cases does increase the ranking score [14].

With Figure 4.4, it showcased a similar trend with an increased number of related cases for the correct offender also increased the ranking position. Here already at four related cases, the rank position for the correct offender was at number 1 out of 27 offenders, with some outliers. The results of the Random Choice with Spatial-Temporal method proved that already at four cases, an investigator can allocate resources to investigate the number 1 ranked with some exception with outliers at rank 2.

The results in experiment 3 displayed that the number of cases related to the offenders did affect the accuracy of the offender ranking. The offenders with more known offences showed an increase in ranking while those with less than four related cases did not perform as well. The explanation and discussion of the results answer the second research question.

KhangTran Statistical-BasedSuspectRetrievalUsingModusOperandi

Statistical-Based Suspect Retrieval

Using Modus Operandi

Khang Tran

Abstract

Contents

List of Figures

Abbreviations

Chapter 1

Introduction

1.1

Related work

1.1.1

Research gap

1.2

Aim & Objectives & Expected outcome

1.3

Research questions

1.4

Method

1.5

Scope

1.6

Relevance

1.7

Structure of the thesis

Chapter 2

Background

2.1

Modus operandi (MO)

2.2

Choice probabilities method - Random Choice

2.3

Spatial-Temporal method

2.4

Combining Random Choice with Spatial-Temporal

method

2.5

Baseline method - Pure Chance

Chapter 3

Method

3.1

Dataset

3.2

Selecting crimes in the dataset

3.2.1

Setting up the crimes

3.2.2

Setting up amount of cases

3.3

Metrics

3.3.1

Choosing metrics

3.4

Statistical test

3.4.1

Friedman test

3.4.2

Nemenyi test

3.4.3

Choosing parameters for the statistical test

3.5

Experiment 1: Analysis the specific MO-categories

3.6

Experiment 2: Identifying the preferred

combi-nation of MO-categories

3.7

Experiment 3: Identifying the amount of cases

related to each offender

3.8

Experiment 4: Identifying the preferred weight

for Random Choice with Spatial-Temporal method

3.9

Experiment 4.1: Identifying the methods

per-formances

3.10

Hardware requirement

Chapter 4

Results and Analysis

4.1