Data-Driven Emptying Detection for Smart Recycling Containers

(1)

Data-Driven Emptying Detection for Smart Recycling Containers

David Rutqvist

Computer Science and Engineering, master's level 2018

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

(2)

(3)

A BSTRACT

Waste Management is one of the biggest challenges for modern cities caused by urbanisation and increased population. Smart Waste Management tries to solve this challenge with the help of techniques such as Internet of Things, machine learning and cloud com- puting. By utilising smart algorithms the time when a recycling container is going to be full can be predicted. By continuously measuring the filling level of containers and then partitioning the filling level data between consecutive emptyings a regression model can be used for prediction. In order to do this an accurate emptying detection is a requirement.

This thesis investigates different data-driven approaches to solve the problem of an accurate emptying detection in a setting where the majority of the data are non-emptyings, i.e. suspected emptyings which by manual examination have been concluded not to be actual emptyings. This is done by starting with the currently deployed legacy solution and step-by-step increasing the performance by optimisation and machine learning models. The final solution achieves the classification accuracy of 99.1 % and the recall of 98.2 % by using a random forest classifier on a set of features based on the filling level at different given time spans. To be compared with the recall of 50 % by the legacy solution.

In the end, it is concluded that the final solution, with a few minor practical modifica- tions, is feasible for deployment in the next release of the system.

iii

(4)

(5)

P REFACE

This master thesis was conducted between January 2018 and May 2018 at BnearIT AB in Lule˚a which is the company responsible for the software behind the Smart Recycling ^R system. I would like to thank BnearIT for giving me the opportunity for not only doing my master thesis at the company but also being able to work at the company during my studies as well as the immense responsibilities and trust put into me.

This project was carried out in close collaboration with my supervisors and I would specially like to thank them. Fredrik Blomstedt has been my external supervisor at Bn- earIT and has provided daily feedback, an industry point-of-view as well as insights from his experience. Denis Kleyko has been my internal supervisor, representing Lule˚a Univer- sity of Technology, and has shown indispensable commitment to this project through our weekly recurring meetings and by providing knowledge and best practice about working with a machine learning project.

Finally, I would like to thank the persons, apart from the supervisors, who have helped with the review of this thesis, Kent Eneris, Stig-Oscar Nilsson and Juha Rajala.

David Rutqvist

v

(6)

(7)

C ONTENTS

Chapter 1 – Introduction 1

1.1 Background . . . 2

1.2 Motivation . . . 3

1.3 Problem Definition . . . 3

1.4 Delimitations . . . 4

1.5 Thesis Structure . . . 4

Chapter 2 – Related Work 7 2.1 Smart Waste Management . . . 7

2.1.1 Commercial Products . . . 8

2.2 Machine Learning in Industry Applications . . . 8

Chapter 3 – Theory and Methodology 11 3.1 Technical Problem . . . 12

3.2 Data Analysis . . . 15

3.2.1 Distribution and Balance of Dataset . . . 16

3.2.2 Performance and Accuracy Measurements . . . 18

3.3 Collecting a Container . . . 23

3.4 Legacy Solution . . . 23

3.5 Dismissed Solutions to the Problem of Emptying Detection . . . 25

3.6 Optimising Parameters in Existing Model . . . 26

3.6.1 Grid Search . . . 27

3.7 Examining New Types of Classifiers . . . 27

3.7.1 Logistic Regression . . . 28

3.7.2 Support Vector Machines . . . 28

3.7.3 Decision Trees . . . 30

3.7.4 Random Forests . . . 30

3.8 Considering More Features . . . 30

3.9 Hyperparameter Optimisation . . . 31

Chapter 4 – Implementation 33 4.1 Obtaining a Ground Truth Dataset . . . 33

4.2 Determining Performance of Legacy Implementation . . . 35

4.3 Grid Search Optimisation . . . 36

(8)

4.4 Classification Algorithms . . . 36

4.5 Exploring and Eliminating Further Features . . . 37

Chapter 5 – Evaluation 39 5.1 Performance of Legacy Solution . . . 39

5.2 Optimised Solution . . . 40

5.3 Classification Algorithms . . . 41

5.4 Extended Features . . . 43

5.5 Summarised Results . . . 45

Chapter 6 – Discussion 47 6.1 Fulfilment of Technical Problems . . . 47

6.1.1 Strengths and Weaknesses of Legacy Solution . . . 47

6.1.2 Performance of the Legacy Solution . . . 48

6.1.3 Study Different Types of Techniques . . . 48

6.1.4 Implement and Test Different Solutions . . . 49

6.1.5 Compare Results to the Legacy Solution . . . 49

6.2 Iterations . . . 50

6.3 Applicability of Final Model . . . 51

Chapter 7 – Conclusions 53 Chapter 8 – Future Work 55 8.1 Using Results in Production . . . 55

8.2 Areas for Continued Research . . . 56 Appendix A – Considered Solutions to the Problem of Detecting Emp-

tyings 57

Appendix B – Container Types in Dataset 59

Appendix C – Scatter Plot for Features in Dataset 61

Appendix D – Decision Bounds for Different Classifiers 63

viii

(9)

L ^{IST OF} F ^IGURES

3.1 Container ultrasound profile with interference . . . 13

3.2 Measurements before and after a real emptying . . . 14

3.3 Missed emptying causing an incorrect emptying prediction . . . 15

3.4 Dataset balance between classes . . . 17

3.5 Balance between classes in the dataset per container type . . . 18

3.6 Distribution of container types in the dataset . . . 19

3.7 Precision and Recall . . . 21

3.8 Emptying of a container . . . 24

3.9 Decision tree for legacy solution . . . 25

3.10 Hyperplane separation by SVM . . . 29

4.1 Data labelling tool . . . 34

5.1 MCC score heat-map for grid search optimisation . . . 41

5.2 Decision boundary, filling level after versus filling level change . . . 42

5.3 Recursive feature elimination results . . . 43

5.4 Hyperparameter optimisation for final classifier . . . 44

6.1 Previously missed emptying being correctly predicted by final solution . . 50

A.1 Resulting mind-map from brainstorm . . . 58

C.1 Distribution and density of features in dataset . . . 62

D.1 Decision boundary, filling level change versus vibration strength . . . 64

D.2 Decision boundary, filling level after versus vibration strength . . . 65

(10)

(11)

L ^{IST OF} T ^ABLES

3.1 Truth table of the legacy solution . . . 24

5.1 Confusion matrix of legacy solution . . . 39

5.2 Optimised parameters . . . 40

5.3 Confusion matrix of the optimised legacy solution . . . 41

5.4 Comparison of scores from different classifies . . . 42

5.5 Selected features . . . 44

5.6 Final confusion matrix . . . 45

5.7 Comparison of all results . . . 46

5.8 Performance per container type . . . 46

B.1 Table of container types in the dataset . . . 60

(12)

(13)

C HAPTER 1 Introduction

Modern cities face many new challenges due to the increased population and urbanisation.

More people move to the cities and everyone has an increased energy demand along with high expectations on the services provided by municipalities, states and private companies.

The waste management is one of the biggest challenges imposed by the rapid growth of urban population. For example, in Europe each person is expected to produce six tonnes waste of materials used in the daily life each year[1]. Materials which may be recycled or used for its energy as heating.

An efficient strategy for facing the challenge of the waste management includes several steps. First, we should rationally consume materials in order to avoid the unnecessary waste. Next, the process of waste disposal should be done in a structured way. Finally, the recycling of the waste should be maximised. When implementing these steps economical and environmental aspects should be taken into account.

Waste transportation greatly affects both aspects and its optimisation can significantly increase the positive effects. At the same time, there is a clear requirement that in order to keep recycling stations clean they should be emptied at a right time. It is non-trivial to fulfil this requirement in a scenario with several hundreds of recycling stations (each with several containers) that are spread over a large geographical area.

Internet of Things is an enabling technology for the waste transportation optimisation.

It will allow each container to report its filling level or even predict the expected emptying time. Such predictions will allow avoiding redundant transportation without violating the overfilling requirement.

The quality of predictions will determine the efficiency of the system. There are several technical challenges for achieving a high quality of predictions. One of them is an accurate detection of a container being emptied. Therefore, the purpose of this project is to identify and investigate several solutions in the existing waste management system for determining when a container was emptied. Solutions may both suggest changes to the existing sensors and technologies as well as completely new approaches to the problem.

1

(14)

2 Introduction

In this thesis, different solutions have been studied and evaluated. Starting with the solution currently used (called the legacy solution), continuing by optimising the legacy solution and then turning the attention towards other machine learning solutions such as support-vector machines, decision trees and random forests. In the end a random forest algorithm was selected and further optimised.

1.1 Background

We are currently experiencing a wave of digitalisation where everything from our bodies to our homes are going to be connected to a network and would possibly have a digital twin. The industry is currently working towards the fourth industrial revolution where everything is connected and cooperates in order to streamline the production process and improve the customer service[2]. Despite all these trends, the waste transportation business has not yet evolved much. When it comes to collecting waste from recycling stations fixed schedules have traditionally been used and are still in use today. The schedules do not take into account whether a container is full or empty. They are also very susceptible to seasonal variations, e.g., at holiday resorts, where filling rate is strongly connected to weather and season of the year.

It is clear that the developments within the areas of Internet of Things, cloud comput- ing and machine learning can be used to digitalise the conventional waste transportation business. It is obvious that from both economical and environmental aspects it is important to optimise the usage of lorries. For example, it is preferable to drive a completely full lorry once rather than driving two lorries which are half full. Even in this simple example both aspects are clear. First, making additional rides increases the environmental pollution caused by lorries. This effect is a subject of public debate[3, 4]. Second, for a transportation company to be economically efficient it is important to optimise its fleet and personnel costs.

The Smart Recycling system was created to address the challenge of efficient waste^R transportation. It is currently being deployed at every recycling station throughout the whole Sweden and the first 1,500 sensors have been running for over a year in the southern part of the country. During the first year, the focus of the system has shifted towards a more process-oriented system where the main information provided by the system is a prediction of the time when a container should be emptied. At the production deployment, new challenges devoted to this goal have been discovered. For example, different types of containers behave differently with respect to sensitivity of sensor measurements.

A few key challenges for the continued development of the system have been identified, where one of them is the ability to determine, with certainty, when an emptying has occurred.

(15)

1.2. Motivation 3

1.2 Motivation

The European Union’s Seventh Environment Action Programme specifically sets an ob- jective on a resource-efficient economy and one of the top priorities for waste policy in the EU is to maximise recycling and re-use[1]. One solution to this is to optimise the collection of recyclables by monitoring and predicting the filling level of each container.

There is no doubt that this is important for the environment, the satisfaction of citizens and for the involved companies. However, one might wonder why detecting an emptying which has already occurred is so important for the functionality of the whole system.

There are two main reasons for this.

First, prediction of the time when a container is going to be full can be done math- ematically by fitting a simple polynomial curve to the filling level measurements. It is done by using the method of least squares which requires the data to be continuous. To achieve this the measurement history must be partitioned by occurred emptyings.

Second, determining the filling level at each emptying for historical reference requires an accurate emptying detection. Historical reference is important both from a gamification point-of-view and as a basis for decisions about continued purchases of the system. The suggested way to do this is to provide the customers with a list of the filling level at each emptying per container and also provide an aggregated view where each customer’s average filling level at emptying is displayed in a graph for the last year.

1.3 Problem Definition

This section describes the overview of the problem investigated in this project. A more detailed technical specification of the problem is available in Chapter 3. Currently, the Smart Recycling system is used in production for scheduling routes of lorry drivers^R in order to collect recyclable materials. The collection routes are entirely based on the predictions of when the containers are going to be full. Thus, an incorrect prediction yields either unnecessary transports (when the prediction is too early) or poor customer satisfaction (when the prediction is too late). Both situations would negatively affect the transportation companies’ economy. As stated in Section 1.2 some kind of perception of historical emptyings is a prerequisite for an accurate prediction. The main goal of this project is therefore to investigate and explore a solution to determine when an emptying has occurred. This problem can be dissected in the following sub-problems:

P1 Analyse and determine the strengths and weaknesses of the legacy solution;

P2 Gather statistics to determine the confidence of the legacy solution to be used as a baseline for the alternative solutions;

P3 Conduct a broad study of different types of techniques applicable to the problem;

(16)

4 Introduction

P4 Implement and test one or several of the most promising solutions discovered during the study phase;

P5 Analyse results, compare them with the legacy solution and make conclusions for continued improvements.

The goal of the project is to design a new solution which would improve the emptying detection performance. If this is successful, the results should indicate the improved results in comparison to the detection performance of the legacy solution. The intention is, if a sufficient detection performance is reached, to use the results practically in the new version of the system. The aim of the project is to improve the quality of the emptying time predictions and the first and most important goal towards this aim is to improve the emptying detection performance.

It is of significant importance that particular concern is taken to a project’s lifetime and continued support when selecting methods and libraries for implementation since an alternative solution should, if possible, be deployed, used, and maintained in the future.

1.4 Delimitations

The area of accurate emptying time prediction is simply a too large research area to cover in a single master thesis project. Therefore, the following delimitations were set on the project.

• The project focuses exclusively on the emptying detection part, as an accurate emptying detection is perhaps the most important prerequisite for an accurate emptying time prediction. Also, the legacy solution of the emptying time prediction will improve significantly if an accurate emptying detection will be achieved.

• The end result of the project should be a description of an alternative solution to the problem of emptying detection along with any configurable parameters to use.

Thus, the end result should not be an actual production ready implementation but rather a guidance on what to implement. Therefore, computational performance is not in the current focus. Instead, the emptying detection performance is prioritised.

This also means that no live data is needed, instead historical data can be loaded during the initial phase of the solution. Finally, since the solution does not need to be production ready the solution can be implemented in a run-once and exit fashion.

1.5 Thesis Structure

This section outlines the structure of this thesis. The thesis starts with the chapter introducing related work, both in the area of Smart Waste Management and in the area of

(17)

1.5. Thesis Structure 5

machine learning in industrial applications. In Chapter 3 the methodology of this project is described as well as the theoretical background behind the methods used. Together this introductory chapter, the related work chapter and the theory and methodology chapter make up the study that was conducted during the early stages of the project.

With the start of Chapter 4 the experimental part of this thesis starts with implementation details on how each solution tested was implemented. The results for each explored solution and their comparison are presented in Chapter 5. The results, comparisons, and the theoretical background are then used for the analysis part starting with discussion in Chapter 6, continuing through conclusions in Chapter 7 and finally introducing directions for the future work in Chapter 8.

(18)

(19)

C HAPTER 2 Related Work

The problem of emptying detection is a rather specific problem which arises within the area of Smart Waste Management when specific measurement techniques are used. How- ever, there are still related work to be studied. In general, the related work for the studied problem can be divided into two parts. First, similar applications in the clean- tech and Smart Waste Management areas are reviewed. Second, more broad area of modern industrial applications using machine learning is considered.

2.1 Smart Waste Management

The topic of Smart Waste Management covers the whole life-cycle of a consumer product.

Since containers are typically collected by lorries, topics such as fuel economy are also related. This section will focus on works regarding the containers, the collection of containers and measuring levels in bins.

As concluded previously, Internet of Things is an enabling technology in the Smart Waste Management applications. Several works have suggested using various technical implementations to help solving the waste management problem in larger cities [5, 6, 7, 8, 9, 10]. In [5] a flexible and scalable platform was suggested for information sharing between heterogeneous devices. The goal of the platform was to monitor the level of a bin’s fullness in order to avoid collecting semi-empty bins. The data could then be exported to other decision algorithms for determining the optimal number of lorries or bins in an area. Compared to the Smart Recycling system the platform targeted^R smaller residential bins while Smart Recycling targets larger containers. In other words,^R we make a distinction between bins and containers where bins are most often used by residents and in public spaces in cities and containers are larger (2 m³ and larger) waste receptacles commonly collected by large lorries equipped with a crane.

Another work [6] goes through the need and use-cases for a Smart Waste Management and suggests a solution with ultrasound sensors uploading their level to the cloud for big

7

(20)

8 Related Work

data processing which strongly resembles the Smart Recycling system. However, the^R article only describes the system and its goals not emptying detection in particular.

A literature review of different analysis and optimisation techniques for urban solid waste management suggested that information technology will aid in designing a smart and green urban waste collection system [7]. The literature review identified both RFID and sonar technologies as techniques of interest, both of which have been used or is currently used in the Smart Recycling system.^R

Finally, two previous case-studies of the Smart Recycling have been conducted and^R published focusing on the impact the system has had on its stakeholders [11] and the development process behind the system [12].

2.1.1 Commercial Products

Since clean-tech, Internet of Things and the environmental issues are important topics it is no surprise that there are several products competing in similar commercial niches.

The list of commercial products can be divided into two types, retrofitted sensors and smart containers. Smart containers are special-made containers, e.g., urban dust bins or containers for cardboard which measure the level while mechanically compressing the cardboard. The containers or dustbins are typically smaller in size and target restaurants, industries, and cities. Examples of such products are Big Belly [13] and CleanCUBE by Ecube [14].

Retrofitted sensors are the same type of system as Smart Recycling and examples of^R such products are Enevo [15], CleanFLEX by Ecube [14], Sensoneo [16], Onsense [17], Smart Waste by Citibrain [18] and Smart Bin [19]. Out of these solutions CleanFLEX seems to be the most relevant to this project since they also offer predictive analysis of the filling level.

2.2 Machine Learning in Industry Applications

It is not only in the research where the machine learning is an area with increased interest. A report from 2017 by McKinsey identified several applications for machine learning in the industry such as predictive maintenance, process and quality optimisations and automated quality testing [20]. The combination of Industrial Internet of Things, known as IIoT, and machine learning has been studied thoroughly for the predictive maintenance [21], i.e., knowing beforehand when a machine or product needs maintenance without it necessarily breaking down before being fixed.

Another study [22] compared different machine learning algorithms, similar to this project, in the application of finding defect mild steel coils, i.e., in the automated quality testing scenario. Examples of algorithms studied included decision trees, neural networks, support vector machines, and ensemble techniques such as boosting and random forest.

Out of these, random forest performed the best and achieved 95 % accuracy.

(21)

2.2. Machine Learning in Industry Applications 9

Two Master Theses [23, 24] at the Swedish lorry manufacturer Scania have studied fuel economy and how, e.g., platooning can improve the fuel economy. Both these theses have used machine learning methods to model the fuel consumption based on several factors, e.g., slope of the road, weight of the lorry and the outside temperature. Finally, a work from University of Nottingham [25] studied different machine learning algorithms to predict the fuel consumption of lorries based on data regarding the characteristics of the road. The study found random forest to be the best performing algorithm and the results were compared with actual consumption to validate the feasibility of predicting fuel consumption.

(22)

(23)

C HAPTER 3 Theory and Methodology

This chapter describes the different parts of the project, prerequisites for each part and in what order they were done during the project. The main parts of the chapter consists of the theory behind the solutions studied during the study phase of the project and for some solutions later evaluated during the implementation phase.

The project started with a study phase which included the following parts. First, the problem was expressed in a non-technical wording and thus had to be reformulated into a more technical context as a starting point for the search after solutions. Since the author of this project is also the system architect of the Smart Recycling product prior^R knowledge about the system, its applications and challenges had to be summarised both in the report and to the supervisors.

The actual studies started with a brainstorm session about possible solutions on all different levels of the system architecture. The complete results of the brainstorm can be found in Appendix A and some dismissed solutions are addressed in Section 3.5. In parallel, the legacy solution was examined with respect to its advantages and disadvan- tages. Methods for optimising the legacy solution as well as completely new methods were studied. However, each solution suffered from not having a ground truth dataset to compare solutions against.

The next phase was therefore the implementation phase which started by developing a system for labelling historical data to use for comparison and perhaps data-driven decisions. Once the implementation was done two weeks of labelling followed while the study continued more focused on data-driven optimisation methods and data-driven algorithms.

During the next iteration in the implementation phase the performance of the legacy solution was determined using the ground truth dataset. The thresholds in the legacy solution were also optimised. After each implementation phase an evaluation phase followed which checked the performance achieved against the goal which determined the way forward. Due to the goal not being achieved new methods and models were studied

11

(24)

12 Theory and Methodology

during the last iteration.

Finally, during the wrap-up phase of the project the report was finalised, a presen- tation was prepared and the results were presented both at the company and at the university.

3.1 Technical Problem

The Smart Recycling product is a complex ecosystem of sensors, services and systems^R with the aim of optimising the waste management where a particular goal is to predict when a container is going to be full. The process starts at the sensor level, continues through mobile wireless communication to the server, which process the incoming data and the extracted information is then delivered to a user interface.

The sensors used in the ecosystem are a customised hardware solution provided by one of project partners. The hardware is equipped with an ultrasonic range sensor, an accelerometer and a GSM module. The sensor measures the filling level of a container regularly throughout the day at a configurable interval and measurements are uploaded to the server. The measuring is done by processing an ultrasound profile as illustrated in Figure 3.1, and then only extracting the distance on a millimetre accuracy. The sensor is retrofitted to the ceiling of its container and, thus, the measured range is a distance from the ceiling to the waste which is then processed at the server to filling level in percent.

Due to the physical characteristics of the ultrasound sensor other objects than the actual waste level could be measured. For example, a container usually has supporting structures or other parts related to the emptying mechanism which may interfere with ultrasound pulses. To avoid this interference the placement of the sensor is crucial. This challenge can be addressed either by adjusting the ultrasound cone or by a filter cancelling an echo. The filter approach is the one being currently in use.

The process is exemplified in Figure 3.1 where the ultrasound pulses are reflected from the emptying mechanism inside the container at a distance of 660 mm from the sensor.

The actual waste level is 1100 mm from the sensor. The difference between the true level and the false echo can be clearly seen in the figure where the true level continues to echo continuously whilst the voltage of the false echo goes back to base voltage directly afterwards.

The example illustrated in Figure 3.1 is a near perfect situation of a clear false echo.

This is not always the case, instead there may be a false echo at the bottom of the container which is very difficult to filter out. The effect of this is that for some containers the level will never reach zero even when it is empty. This fact makes it impossible to use the assumption of measuring the absence of the waste as a basis for emptying detection.

In the considered situation additional information is needed. Therefore, an accelerometer has been added to the sensor. It continuously measures the acceleration of the sensor (and the container respectively). If the acceleration exceeds a configurable threshold in any of the axes the sensor wakes the main processor with an interrupt. Then the pro-

(25)

3.1. Technical Problem 13

Figure 3.1: A typical ultrasound profile of a container where the emptying mechanism reflects ultrasound pulses at around 660 mm from the sensor.

cessor measures how many interrupts it receives within a given time period and reports this as a vibration score to the server. Since containers are emptied by lifting them with a crane this event should trigger a single distinct vibration sample. This solution provides satisfactory results and, therefore, it has been used during the last years. However, with the introduction of new container types and new fractions in the southern parts of Sweden as well as a new accelerometer model this solution has become insufficient.

Instead of a single vibration during emptying, extra vibrations are registered when citizens throw their waste. If using the legacy solution several emptyings per day could be detected. The extra vibration samples are clearly illustrated (orange dots) in Figure 3.2 which has a single true emptying at the first of February. To overcome this issue each vibration sample should be classified as either a true emptying or a false emptying. The solution currently in use, known as the legacy solution, processes each sample using a set of static rules, as described further in Section 3.4. The rules define whether the current vibration sample corresponds to an emptying or not.

The knowledge about the occurrence of an emptying provides an important historical reference to the users but, more importantly, it makes it possible to predict when the container is going to be full again. The data gathered from the live deployments have shown that the filling rate mostly follows either a line or a simple polynomial function.

Since a line is just a special type of a polynomial function a polynomial function can

(26)

Figure 3.2: Distance (blue) and vibrations (orange) measured by a sensor from 14th of January to 23rd of February 2018. The distance samples correspond to the distance between the sensor and the waste level, i.e. inverted filling level.

be used in the general case. The filling level can, therefore, be predicted using a least square fit of the data between two adjacent emptyings. The predicted emptying date can then be obtained by extrapolating the estimated polynomial function for the case when the filling level will reach 90 %. The target filling level was initially 100 % but was later changed to 90 % per request by the customers since they typically need a day to schedule a container for collection.

Formally, let f (t) denote the function obtained by least square fit, which describes the container’s filling level at a given time stamp, t. Then the predicted emptying, ˆt is obtained by solving the following equation

f ˆt = 0.9. (3.1)

Currently, the function f (t) is approximated as a polynomial function of degree ten.

However, the second or third degree polynomials should probably be sufficient by judg- ing of the historical data. The Newton-Raphson method is used to numerically solve equation (3.1).

This method works well for most of the currently deployed containers, however, there are two special cases when the method fails. First, it fails if the container’s filling rate is very slow, i.e. flat or even worse decreasing due to compression of the waste. In this situation, the prediction may either be several years in the future or Equation (3.1) would have no real solutions. Second, the least square fit was built based on the partitioning of data between two adjacent emptyings. If an emptying is not detected then the fitting

(27)

3.2. Data Analysis 15

Figure 3.3: Filling level of container from 13th of October to 22nd of December. Filling levels in blue are a simple moving average of distance samples. The orange curve is a polynomial function fitted between adjacent emptyings. A missed emptying and the effect it has on the function can be clearly seen around November 13. A detected emptying is displayed as an interruption in the blue curve.

will be done on discontinuous filling level data. This is depicted in Figure 3.3 where an emptying was missed around November 13. The orange curve is the fitted function for each partition and follows the blue curve closely for all partitions but the partition between October 23 and December 2.

Addressing the first case is a matter of changing prediction method for slow filling containers but the second case requires robust detection of emptyings. This project is primarily devoted to this problem. After all, an inaccurate prediction may lead to a container being overfilled which in turn leads to complaints from users and economical losses. This problem has, therefore, been identified as one of the key challenges for the continued development of the system.

3.2 Data Analysis

Knowing your data is a key to success when it comes to data-driven decisions, models and solutions. The legacy solution was entirely based on personal suggestions and beliefs about the characteristics of how the data would look like and what defines an emptying.

(28)

Intuitively, an emptying should result in a filling level of zero percent and to distinguish it from a compression there should be a distinctive decrease of waste. However, as mentioned, characteristics of the measuring device may result in data that is far from the intuitive expectations. Therefore, a data-driven model could give better results than a model described by a domain expert intuition.

Once data has been collected and labelled as either an emptying or a non-emptying, i.e. a sample which by manual examination has been concluded not to be an actual emptying, it is important to get an idea of the distribution and balance of the data. This in order to choose the correct methods and accuracy measurements as well as setting a sufficient goal. For example, if the dataset contains 99 % non-emptyings and a classifier successfully classifies 95 % correctly as non-emptyings it is still not a good result[26].

On the other hand, if the dataset contains 50 % non-emptyings then 90 % is probably a good result. To avoid this the distribution and characteristics of the dataset must be known to select the correct type of accuracy measurement. Examples of such performance measurements include F1 score[26] and the Matthews Correlation Coefficient[27].

3.2.1 Distribution and Balance of Dataset

The dataset used in this project was derived from actual measurements during 2017 for which parts of the data have been labelled. For more information on how the dataset was obtained see Section 4.1. The dataset contains vibration samples which either corresponds to an emptying or not. In other words it has positive (emptyings) and negative (non-emptyings) examples.

It contains emptyings from different types of container types, emptyings from different filling levels and emptyings from two different fractions, i.e., types of recyclable materials such as glass, cardboard and paper. It is important to know a rough estimate on the distribution between features to know if the dataset is representative to the real world application, i.e. to what extent the samples are representative to the true population.

As illustrated in Figure 3.4 the dataset contains a lot more non-emptyings than emptyings. This was anticipated based of the previous experience which can also be seen in Figure 3.2. Since the amount of non-emptyings is almost 80 % a classifier which always guesses non-emptying would get an accuracy of about 80 % as well. The percentage of correctly classified samples will, therefore, not be the best accuracy measurement to use.

Instead, a measurement which accounts for unbalanced datasets should be used. This is discussed further in Section 3.2.2.

It remains to be justified whether the dataset is a reasonable representation of the true population. Figure 3.4 shows the balance of emptyings and non-emptyings for each container type along with the average for the whole dataset. One can notice that some container types deviates a lot from the dataset mean. There can be several reasons for this. First, some container types are not as common as others and may therefore have less data. For example, ”Dragspel” only has 108 samples, as seen in Appendix B. Second,

(29)

Figure 3.4: Balance between emptyings and non-emptyings in the whole dataset used in the project. The dataset is heavily unbalanced which must be considered when selecting accuracy measurement.

container types have different physical properties and may therefore generate more or less false emptyings resulting in a balance which may be different from the dataset mean without necessarily being incorrect.

There are several ways to select data for a dataset. Either a general random dataset picked from known data or a dataset which targets some specifically challenging part of the problem. For example, a large dataset can be divided into an easy and a difficult part in order to test models more quickly to get a suggestion which models to continue to run for the whole dataset. The dataset in this project aims at being a more challenging dataset but still be reasonably representative for the true population.

About half of the samples in the dataset were chosen from time stamps where the legacy solution classified samples as non-emptyings but the hauliers entered that they in fact did empty a container at the station. These time stamps have then been labelled manually as described in Section 4.1. Figure 3.6 shows the distribution of container types for the dataset compared to the total distribution in the municipalities that are present in the dataset. For most container types the distribution is roughly the same but ”3.6 kubik” is over-represented in the dataset. This could maybe indicate that the legacy solution struggles with that container type in particular.

(30)

Figure 3.5: Balance between emptyings and non-emptyings for each different container type.

The dataset mean is the amount of emptyings for the whole dataset.

3.2.2 Performance and Accuracy Measurements

We now turn our attention to different methods of determining accuracy and what the performance goal of this project is. In order to set a goal and to check whether it was achieved or not the goal must be specific and measurable. Initially, before analysing data, one could state that 90 % classification accuracy would be sufficient for the system.

However, since the dataset used is unbalanced this could almost be achieved by always guessing non-emptying which would then severe the bigger aim for this project of improv- ing the prediction of emptyings. Thus, we conclude that another accuracy measurement is needed.

In order to be able to reason about and decide on which accuracy measurement to use there is a need for introducing a few concepts and their meaning in this particular context.

The problem which we are dealing with is called Dichotomous Binary Classification since we want to take a distinct sample and label it as either emptying or non-emptying. In the binary case the result can be divided into four subcategories namely true positives, true negatives, false positives and false negatives.

The true positives refers to samples correctly classified as an emptying, i.e. an emptying was in fact classified as an emptying. Similarly, true negatives refers to non-emptyings being correctly classified as a non-emptying. On the other hand, false positives and false

(31)

Figure 3.6: Distribution of container types in the dataset used versus in the whole system.

negatives are samples which are incorrectly classified. In other words, a false negative refers to an emptying incorrectly classified as a non-emptying[28].

Before describing the different performance measurements considered it is important to elaborate on the impact of false positives and false negatives in the context of emptying prediction. As mentioned in Section 3.1 the emptying prediction is based primarily around partitioning the data between the latest emptying and present time. In the case of false positives there will be an extra partition and since only the most recent partition is used the result is a decreased amount of data used in the prediction. This will worsen the prediction for the direct time after but in the long run will not have any noticeable effect. In the case of false negatives the system will, as described and illustrated in Section 3.1, not be able to do a correct prediction until a new emptying is registered.

Thus, we conclude that it is better for the prediction result to favour false positives over false negatives.

It has already been illustrated why simple classification accuracy, i.e. percentage correctly classified, is an insufficient measurement due to the unbalanced dataset used.

However, it remains to describe the other options considered. Accuracy or performance measurement is a way of summarising a confusion matrix into a single value, from here on referenced as a score or metric. This to be able to sort different models, algorithms and optimisation results and in the end pick the best one.

We will consider F-Measure, the Matthews Correlation Coefficient and a Cost-Sensitive

(32)

method. First of all, recall and precision has to be explained. Recall or sensitivity is in this context a measure of the proportion of real emptyings that are actually predicted as emptyings[28]. Equation 3.2 described how the recall is calculated given the number of true positives, T P , and the number of false negatives, F N .

Recall = T P

T P + F N (3.2)

On the other hand, precision or confidence is the proportion of predicted emptyings that are in fact real emptyings[28]. Similarly, as for recall the precision can be calculated using true positives and false positives, F P , as described in Equation 3.3.

Precision = T P

T P + F P (3.3)

To summarise, recall is a measure of completeness and indicates that the classifier does not miss any real emptyings whilst precision is a measure of exactness and indicates how certain one can be that classified emptyings is in fact a real emptying[26]. A visual explanation of all these terms can be found in Figure 3.7.

F-Measure

The F-measure combines the precision and recall as a ratio of weighted importance of the two, determined by the coefficient β as described in Equation 3.4[26],

F-Measure = (1 + β)²· Precision · Recall

β²· Precision + Recall . (3.4) A larger value on β corresponds to a score which favours recall over precision[29]. The most commonly used value on β is β = 1, known as F1 score, which makes Equation 3.4 correspond to the Harmonic mean and puts the same trade-off on both recall and precision.

Matthews Correlation Coefficient

The Matthews Correlation Coefficient, MCC for short, is a measurement which is only applicable to binary classification. It is a correlation coefficient in the sense that it ranges from −1 to 1 where 1 indicates perfect agreement between prediction and observation, −1 total disagreement and 0 indicates that the classifier performs no better than random[27].

MCC is calculated using Equation 3.5.

MCC = T P · T N − F P · F N

p(T P + F P ) · (T P + F N) · (T N + F P ) · (T N + F N) (3.5) An immediate difference compared to both the F-measure and the classical accuracy can be noted namely that the MCC also takes true negatives into account and is therefore

(33)

Figure 3.7: A visual explanation of true and false positives and negatives as well as precision and recall.

Source: Walber, https: // commons. wikimedia. org/ wiki/ File: Precisionrecall. svg

(34)

composed of all cells in the confusion matrix. The effect is that MCC is large only if the classifier is doing well on both positive and negative samples whereas F-measure and accuracy may be large even though the classified performs poorly on negative samples.

The MCC can be used either as a score during training but more importantly act as an alarm by evaluating each test performance through the MCC since a zero or null value may indicate that the classifier is performing poorly[30].

Cost-Sensitive Learning

Classification accuracy, F-measure and MCC all put equal weight on correct and incorrect predictions for both positive and negative samples. However, sometimes it is more important that a classifier gets the positive samples right than necessarily classifying all negative samples correctly. To achieve this a cost-sensitive model can be used.

Let C denote the, for the binary case 2 × 2, cost matrix where C (i, j) denotes the cost of classifying a sample of class j as i. These costs are either given by domain experts or learned using some other method[31, Cost-Sensitive Learning]. For example, a simple model is C (0, 0) = C (1, 1) = 0 and C (0, 1) = C (1, 0) = 1 which corresponds to counting the number of incorrectly predicted samples. The end goal for the learning is to minimise the total cost, unlike the other measurements which goal was to maximise the score. The idea behind this is that sometimes it is better to act as if one class is true even though another one might be more probable by adjusting the weights in the cost matrix[32].

It should be noted that cost-sensitive learning is an area of research of its own but the metrics can be used in this area as well.

Metrics Chosen and Goal

It remains to decide which metrics should be used and set a performance goal for a solution. We have already concluded that it is better for a classifier to favour false positives over false negatives, i.e. we put more importance on recall over precision.

However, we have not concluded whether we put enough importance on recall such that it should be reflected in the metrics used.

All metrics considered except classification accuracy take the imbalanced dataset into account which is a requirement for the metric used. The F-measure, more specifically the F1 score, is one of the most, if not the most, commonly used score and fulfils the requirement set on a metric. MCC also fulfils the requirements and takes the true negatives into account as well which no other metric does. MCC is also the recommended metric by D.Chicco[30]. Hence, MCC is the one used, however, we still calculate the F1-score for all classifiers as another comparison. The cost-sensitive metric was also considered but the MCC was found to provide sufficient balance between correct and incorrect predictions and is, as addressed, the recommended metric for binary classification.

When it comes to a reasonable goal the traditional classification accuracy with a minor adjustment is used since the goal must be related to the system as a whole. Since it

(35)

3.3. Collecting a Container 23

is most important with a high accuracy of positive classification the goal is made up of two parts. First of all the classifier should have an overall accuracy of at least 90 % and second, at least 95 % of the positive samples should be classified correctly, i.e. the recall should be at least 95 %. Both these goals individually suffers from the problem of a classifier always guessing one class but together they make up a reasonable goal even for an imbalanced dataset as the one used.

3.3 Collecting a Container

Recycling containers come in several different forms of dimensions, shapes and emptying mechanism but this section will explain some of the characteristics of the containers currently measured on and specifically how they are emptied. Knowing how the containers are collected may provide useful knowledge on what type of data is needed in order to decide whether the container has been emptied or not.

As described in Section 2.1.1 there are several commercial products currently on the market. The Smart Recycling product targets larger containers such as those used in^R residential areas in Sweden. These containers typically range from 2 m³to 6 m³ in volume and are made of either steel or fibreglass. The larger size compared to curbside containers, commonly used in Sweden for household waste, reduces the ultrasound interference from the container’s walls. However, the emptying mechanism used in many containers still causes interference.

These large containers have two different types of emptying mechanisms namely bottom- emptying and tilt-emptying. Bottom-emptying containers are lifted by a crane and when the container is in position above the lorry a steel wire is pulled which opens the bottom of the container. A bottom-emptying container during emptying is shown in Figure 3.8.

The tilt-emptying container is emptied by a special front loading dustbin lorry equipped with a forklift in the front. The container is then lifted backwards above the lorry which opens a hatch on top of the container. Thus, both emptyings mechanisms requires the container to be lifted.

3.4 Legacy Solution

The legacy solution to the problem of identifying vibrations as emptyings uses a set of static rules. This set is applied every second hour to check for new vibration samples. The legacy model uses only vibration samples and the ultrasound samples directly preceding and succeeding the vibration sample in time. This makes the rules susceptible to minor measurement errors in the near proximity of an emptying.

Lets denote the filling level before as F L¹ and the filling level after as F L². The filling level change, ∆F L, is then ∆F L = F L²− F L¹. Finally, the vibration strength is denoted by Vstr. In order for a vibration sample to be classified as an emptying, the

(36)

Figure 3.8: A typical Swedish recycling container being emptied. The container is emptied by pulling the steel wire attached to the crane on the right side of the hook. Pulling the wire opens the bottom of the container and the recyclable waste falls into the lorry.

Source: F¨orpacknings- och Tidningsinsamlingen AB (FTI).

sample must first have a strength of at least six, i.e., more than five interrupts must have been registered. Then the filling level must drop more than 60 % or the filling level must drop more than 10 % and the filling level after must be below 10 %. The rules with the established notations can be summarised as a truth table, illustrated in Table 3.1 (excluding vibration strength), or as a decision tree, illustrated in Figure 3.9.

Table 3.1: Truth table for the static rules used in the legacy solution except the initial filtering of vibration samples with fewer interrupts than six. The two rules jointly cover the upper three cells and the top right cell is covered by both rules.

Level After

Level Change

∆F L ≥ −10 % −10 % > ∆F L ≥ −60 % −60 % > ∆F L

F L2 <10 % False True True

F L2 ≥ 10 % False False True

(37)

3.5. Dismissed Solutions to the Problem of Emptying Detection 25

Figure 3.9: Decision tree for the legacy solution. The tree is traversed from the root until a class (coloured) is found. Note the negative sign on the filling level change, hence, a decrease.

3.5 Dismissed Solutions to the Problem of Emptying Detection

Before introducing the steps taken towards the final solution to the problem the solutions considered but dismissed during the study phase should be addressed. Since the main problem with the legacy solution is the interference caused by reflecting structures inside the container one obvious solution would be to avoid the reflections by adjusting how the distance is measured. This could be achieved by either adjusting the width of the cone for the ultrasonic range sensor or switching to for example laser technology. Both possibilities have been examined thoroughly by the hardware partner in the project and both have proven to make the quality of the measurements increase in a laboratory environment.

However, both solutions suffer from an increased complexity when it comes to installation location of the sensor, i.e. they require more precise placement and alignment inside the container, this is unfeasible since the installation is done by the end-customers with limited knowledge of how to place the sensor at an optimal location. The sensor also

(38)

becomes less general in terms of applicability to different container types and fractions.

For example, newspaper containers tend to have a slope from the inlet to the back of the container with newspapers resulting in an indirect reflection from the waste level. Fi- nally, during field testing of laser technology the dusty environment inside the containers has discovered the need of keeping the lens of the sensor clean. Cleaning the lens of each sensor is unacceptable when it comes to maintainability with an increasing amount of sensors.

Another proposed solution to the problem of emptying detection is to detect the actual emptying and not the resulting empty level. This is the reason why the accelerometer was added but proved insufficient. Adding a gyroscope to the sensor unit could help distinguish emptyings and non-emptyings since during an emptying the container is physically moved which would be registered by the combination of gyroscope and accelerometer data. This solution suffers from two major flaws.

First, the sensor is battery powered and should be sleeping during most of the time and only wake up during range measurement or if there is a suspected emptying. The existing accelerometer wakes the main processor once the acceleration exceeds a predefined threshold. The hardware partner in the project has not been able to find a similar low-power autonomous gyroscope.

Second, if there were such a sensor available on the market it would require all currently installed sensors to be sent to service in order to add the new hardware. This is simply not economically feasible with 1,500 sensors installed with a large geographical distribution.

However, it could be considered for the next generation of sensors.

Finally, an obvious solution to the problem of emptying detection would be for the hauliers themselves to report when they emptied the containers. This could be achieved by using some application for manual reporting, measuring the presence of the vehicle or using a RFID solution. Using an application has been tried for some fractions as addressed in Section 4.1 but this suffers from inaccurate time stamps. The detected emptyings are also used for payment to the hauliers and it is therefore not suitable for the hauliers to have any influence on the detection. The last two solutions requires additional hardware to be installed on the vehicles and this is not, currently, acceptable since the system should only require one installation in each container in order to be cost-efficient and simple to use.

3.6 Optimising Parameters in Existing Model

Having a ground truth dataset gives possibilities for evaluating the performance of the legacy solution as well as for using the dataset for optimising the parameters of the legacy solution. The legacy thresholds are based on observations of how a container is physically emptied and what reasonably should happen when it comes to the filling level before and after a suspected emptying, i.e., the thresholds have been set by a domain expert.

However, before having a ground truth dataset it has been impossible to empirically

(39)

3.7. Examining New Types of Classifiers 27

evaluate these thresholds to improve the model.

In order to evaluate performance of the legacy solution, optimise it and compare the optimisation with the legacy solution in a fair way, the dataset was partitioned into two subsets. A training set which was used during optimisation to find a better set of thresholds and a test set for which the legacy solution and the optimised solution were evaluated.

3.6.1 Grid Search

One way of evaluating such thresholds is simply to try all different possibilities and rank the combination of thresholds using a performance metric. This is called a grid search.

Let us say a model has three thresholds, a, b, and c, corresponding to three rules. For each threshold define a finite set of possible values, if the thresholds are continuous then select a number of possible values of particular interest. Formally:

a∈ {a¹, a2, . . . , a_n} , (3.6) b ∈ {b¹, b2, . . . , b_m} , (3.7) c∈ {c¹, c2, . . . , c_l} . (3.8) Let si,j,k denote the score obtained by running the model on the training set for the combination of thresholds (ai, b_j, c_k). S is then the n × m × l score array where each cell corresponds to a combination of thresholds and holds the score for that particular combination. This method is valid for any number of thresholds, e.g., for two thresholds the score array will be a matrix. However, since the number of combinations to test is n· m · l the method is unfeasible for a large number of thresholds or thresholds with a large number of possibilities.

The goal of grid search is to find the cell with the largest score, hence the optimal set of thresholds from the finite set of possibilities. If the legacy thresholds are part of the finite set one cell will correspond to the legacy solution’s score on the training set, thus, if it already was the optimal combination grid search will return the legacy solution.

3.7 Examining New Types of Classifiers

As described in Section 3.4, the legacy model is based on a fixed set of rules. Since there is only one set of rules the dataset must be linearly separable in order for the model to achieve a sufficient performance. Instead, if there is a more complex relation between the features used in the model then a fixed set of rules might be too simple.

Such complex relations are sometimes not obvious to a domain expert, therefore, another type of classifier may be needed which finds these relations based of the dataset.

There is a huge amount of classifiers to consider from different groups of algorithms. In order to narrow the search it is important to know which type of problem we are trying

(40)

to solve. As described in Section 3.1 the problem studied in this project is a dichotomous binary classification problem.

There are then a couple of different types of algorithms to consider. Mainly regarding how the algorithm learns about the reality. Supervised learning takes an input of features and their true output and learns patterns to predict input not yet seen in a similar way[33].

Unsupervised learning takes input and finds own categories from the dataset[33]. Semi- supervised learning are used when there is no complete dataset of labelled data. Instead, parts of the data may have labels and the rest may only have the input[33]. Lastly, reinforcement learning learns continuously from its past experiences by getting feedback as rewards or reinforcement from the environment[33].

In this project a ground truth dataset is available and therefore algorithms implementing supervised learning were studied further. This section presents and elaborates further about the algorithms studied.

3.7.1 Logistic Regression

The first method studied is logistic regression. Logistic regression is an extension of linear regression applied on a classification task[31]. Regression is the task of predicting a continuous output from a given set of input features whereas classification is the task of predicting a finite set of values (labels). Logistic regression uses linear regression to predict the probability that a set of inputs belong to a class. By applying probability a problem of classifying input as a finite set of labels can be viewed as a problem of predicting a continuous value, i.e. the probability[34].

Logistic regression supports multiclass classification, however, in this project only binary classification is needed. For the binary case the classes are referred to as either 0 or 1 which in this context corresponds to a non-emptying and an emptying. Let Y be the response variable, X be the input vector, i.e. the set of features, the probability that the sample is an emptying given the input x is then

P r(Y = 1|X = x) = exp (β⁰+ β¹x)

1 + exp (β⁰+ β¹x) (3.9) where the constants β⁰and β¹are learned during training phase using the Newton–Raphson method[34].

When predicting new values the probabilities for each class are calculated and the class with the highest probability is chosen. In the binary case only one calculation is needed since the probabilities should add up to 1[34] and thus

P r(Y = 0|X = x) = 1 − P r (Y = 1|X = x) . (3.10)

3.7.2 Support Vector Machines

The second algorithm studied is the support vector machines, or as called when dealing with classification support vector classifier. Support vector machines is a linear algorithm

(41)

3.7. Examining New Types of Classifiers 29

which, in the case of binary classification, finds a hyperplane which separates the two classes with the maximum margin[31] as illustrated in Figure 3.10.

Figure 3.10: Maximum hyperplane separation by a Support Vector Machine with linear kernel on a binary dataset. The dashed line is the support vectors for this separation.

Source: http: // scikit-learn. org/ stable/ auto_ examples/ svm/ plot_ separating_

hyperplane. html

In Figure 3.10 a linear kernel is used, i.e. the dataset is separated by a line. However, support vector machines support several different types of kernels such as a polynomial of a selected degree, radial basis or as a neural network[34]. By using these kernel functions in the maximum margin model results in a nonlinear support vector machine which is important in many real world applications[31].

During the training phase the model finds the support vectors which are the points that separates the two sets, illustrated in Figure 3.10 as dashed lines. However, in real world applications it is common that the sets are not entirely separable, e.g. due to noise.

To overcome this issue the concept of soft margin is introduced. Soft margin allows some points to be on the wrong side of the support vector by introducing a distance based penalty[31]. During training the margin is then maximised as previously but allows points to be on the wrong side up to a predefined constant budget[34].

(42)

3.7.3 Decision Trees

Decision trees is a completely different type of classifier and slightly resembles the legacy model. It is made up of a set of rules for which the answer of one rule decides which rule to evaluate next. This can be represented graphically like the legacy model illustrated in Figure 3.9.

The difference between the legacy model and decision trees is how they are constructed.

The legacy model was constructed by a domain expert but decision trees are constructed using data during training phase of the algorithm. Decision trees partitions the feature space into rectangles and then finds the thresholds, similarly to the optimisation phase of the legacy solution. Compared to other algorithms tree-based solutions are conceptually simple but still powerful[34].

3.7.4 Random Forests

The last algorithm studied is random forest which is an ensemble learning method[31].

Ensemble learning means that the method trains multiple classifiers and combines their respective output like a committee of decision makers[31]. Random forest uses voting for classification[34], i.e. it runs the other algorithms and then decides on the class which the majority of the algorithms voted for.

Random forest is heavily related to decision trees described in Section 3.7.3 since it uses multiple decision trees in its ensemble, hence the correlation in names between tree and forest. A random forest is constructed by constructing a number of random decision trees from different training sets. The training sets are obtained by sampling with replacement from the original training set, known as a bootstrap sample[34]. When constructing a normal decision tree all features are considered for a split for each node in the tree, however, for a random tree only a random subset of the features are considered at each node[34].

3.8 Considering More Features

To further improve the performance of the best performing classifier more features can be added to the model. It is not necessarily the case that more features will improve the score since the legacy features may be the optimal ones. However, if that is the case then it must at least be verified.

Feature selection is the process of first establishing feature candidates and then selecting which ones to include in a classifier. For small datasets more features may result in overfitting[35] but on the other hand more features may decrease the weight put on each one which can make the classifier more fault-tolerant.

There are basically two ways to select features for a classifier. Either a domain expert constructs and selects which features to use or all available features are added to an algorithm which selects them for you based on data. However, the second method requires

(43)

3.9. Hyperparameter Optimisation 31

a domain expert as well to construct all features to be considered by the feature selection process.

The legacy features is one example of feature selection done by a domain expert and since the scope of this section is to check whether there are additional features to consider the data-driven feature selection is the preferred method. When selecting features to investigate a few strategies can be used.

First strategy is to think about what characteristics defines an emptying. This strategy is the one used for the legacy features. Features to consider include for example filling level before and after, acceleration of unit (vibration) and filling level change. Another strategy is to think what characteristics or faults may cause the legacy model to incorrectly classify an emptying as a non-emptying. For example, an erroneous measurement caused by interference as described in Section 3.1. Since interference is uncommon this could perhaps be solved by considering features such as filling level three hours before and after. Finally, meta features could be considered as well such as fraction, container type, geographical location, temperature and more. Of course all features must correspond to a label and time stamp to be of any use.

When all feature candidates have been identified the ones to use have to be selected.

This procedure is known as feature selection and can be done using a method called recursive feature elimination (RFE). This procedure is especially important for classifiers such as logistic regression or support vector machines since they have non-zero weights for all features in the model[34]. Other classifiers may assign a feature zero weight which corresponds to not using the feature without the need to remove it completely. However, there is still an idea to remove unnecessary features to make the model simpler and more memory efficient.

Recursive feature elimination is, despite its name, an iterative procedure[35] on a predefined classifier, e.g. support vector machine. The first step is to fit the model on the training set using all available features. Then the features are ranked using a selected ranking criterion. The feature with the smallest ranking criterion is then removed and the procedure is repeated, each iteration yielding a new subset of features[35]. To select the optimal subset the subsets must be scored, this can be done using either a validation set or using cross-validation as described in Section 3.9.

3.9 Hyperparameter Optimisation

Once all features have been selected and the algorithm to use has been chosen, it is time to make the final adjustments to the model. This can be compared to fine-tuning any physical machine or instrument to achieve the most optimal performance. Most, if not all, machine learning algorithms have special parameters which decide how the algorithm behaves. For example, a multilayer perceptron neural network has a variable amount of hidden layers and a learning rate, a support vector machine has a kernel function and a random forest has a number of trees in the forest. These parameters are often set by a

Data-Driven Emptying Detection for Smart Recycling Containers