Feature Selection and SVM through Satellite Imagery

(1)

UPTEC STS 20032

Examensarbete 30 hp August 2020

A Scalable Approach for Detecting Dumpsites using Automatic Target Recognition with

Feature Selection and SVM through Satellite Imagery

Markus Skogsmo

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Detecting Dumpsites using ATR with Feature Selection and SVM through Satellite Imagery

Markus Skogsmo

Throughout the world, there is a great demand to map out the increasing environmental changes and life habitats on Earth. The vast majority of Earth Observations today, are collected using satellites. The Global Watch Center (GWC) initiative was started with the purpose of producing a global situational awareness of the premises for all life on Earth. By collecting, studying and analyzing vast amounts of data in an automatic, scalable and transparent way, the GWC aims are to work towards reaching the United Nations (UN) Sustainable Development Goals (SDG). The GWC vision is to make use of qualified accessible data together with

leading organizations in order to lay the foundation of the important decisions that have the biggest potential to make an actual difference for the common awaited future. As a show-case for the initiative, the UN strategic department has recommended a specific use-case, involving mapping large accumulation of waste in areas greatly affected, which they believe will profit the initiative very much. This Master Thesis aim is, in an automatic and scalable way, to detect and classify dumpsites in Kampala, the capital of Uganda, by using available satellite imagery. The hopes are that showing technical feasibility and presenting interesting remarks will aid in spurring further interest in coming closer to a realization of the initiative. The technical approach is to use a lightweight version of Automatic Target Recognition. This is conventionally used in military applications but is here used, to detect and classify features of large accumulations of solid-waste by using techniques from the field of Image Analysis and Data Mining. Choice of data source, this study's area of interest as well as choice of methodology for Feature Extraction and choice of the Machine Learning algorithm Support Vector Machine will all be described and implemented. With a classification precision of 95 percent will technical results be presented, with the ambition to promote further work and contribute to the GWC initiative with valuable information for later realization.

Examinator: Elísabet Andrésdóttir

Ämnesgranskare: Ingela Nyström

Handledare: Björn Åkesson

(3)

Populärvetenskaplig sammanfattning

Det finns ett allt större globalt intresse att undersöka och förstå hur vår jord faktiskt mår.

För att kunna göra det behöver information kunna extraheras från insamlad data från jorden som idag finns i mycket stora mängder. En stor del av dessa datamängder finns i form av satellitdata. Mängden är så stor att endast en väldigt liten del hinner användas för sina tilltänkta syften. Initiativet Global Watch Center (GWC) startades med syftet att vara ett vakande öga för planetens hälsa och att stödja arbetet med FNs globala hållbarhetsmål.

Genom att samla, studera och analysera data på ett automatiskt sätt är visionen att finna information som tillsammans med världens ledande organisationer ska kunna lägga grunden för att ta de viktiga beslut som har störst potential för att göra en faktiskt skillnad för vår gemensamma framtid.

Som ett belysande exempel har GWC diskuterat ett specifikt användningsfall med FN Global Pulse. I dialogen har FN bekräftat att användningsfallet både kan hjälpa till med att främja initiativets vision samt att sporra till mer intresse för ett realiserande av GWC. Detta användningsfall syftar till att kartlägga stora ansamlingar av sopor i regioner som är både hälsomässigt och klimatmässigt utsatta på grund av bristande sophantering.

Detta examensarbete syftar till att visa teknisk genomförbarhet för initiativet och förhopp- ningarna är att kunna presentera intressanta resultat och lärdomar för att komma närmare en realisering av GWC. I Ugandas huvudstad Kampala har man idag stora problem till följd av bristfälliga system för avfallshantering då många dumpningar görs olagligt, vilket också leder till att det är svårt att hålla koll på var dessa "dumpsites" finns eller när de uppkom- mer. I ett första stadie hade en automatiserad detektering och klassificering av soptippar hjälpt till med att lokalisera var dessa soptippar kan finnas. Då syftet är att visa genomför- barhet för att i ett senare skede potentiellt kunna göra liknande analyser över hela planeten behöver analysen ske resurseffektivt. Därför läggs mycket fokus på effektiva metoder att analysera data. Med initiativets icke-kommersiella bakgrund är tanken att med hjälp av fritt tillgängliga verktyg och data lyckas med detta. Detta är utmanande på grund av begränsad kvalité av data.

En teknisk lösning som direktöversätts från engelska till automatisk måligenkänning, som konventionellt används för militära applikationer, introduceras i denna studie. Denna lösning använder teknik som baseras på bildanalys och informationsutvinning. För att lyckas med att effektivt analysera stora mängder data antas i denna studie att 20 procent av all data i analysprocessen ger 80 procent av all information. Resultaten visar att det är möjligt att göra klassificeringar av enskilda pixlar i en satellitbild som är eller kan vara soptippar genom att använda information om soptippars bildanalytiska karaktär. Genom att paketera denna lösning i ett verktyg som är lätt för alla att använda, oavsett teknisk bakgrund, är förhoppningarna att insamlad data ska kunna ge information som alla kan förstå och att stödja användare med att lyckas finns fler potentiella och intressanta användningsfall för metodiken och de använda teknikerna.

(4)

Acknowledgements

This study is the result of a master thesis at Uppsala University. I would like to thank my reviewer, professor Ingela Nyström (UU) for valuable remarks and insights.

I am thankful to Erik Gustafsson and Anders Karlsson for their work towards a cleaner Kenya and contribution to this study by sharing their expertise.

Also, I am grateful for Johan Tenstam (AFRY) who has given me the opportunity to write this thesis, as well as for Ruben Cubo (AFRY) and Carl Bondesson (AFRY) for successfully assisted this study through valuable discussions and good pointers. I am most grateful for my supervisor Björn Åkesson (AFRY) who has participated in keeping my motivation at the maximum throughout the thesis and his devoted passion for contributing to the important vision this study has as its motivation.

Finally, I would also like give acknowledgements to Alexander Engberg, Fabian Persson, Michael Fjellander, Sol Skärdin, Anton Andrée and Linnea Lindström for their notes.

Markus Skogsmo Uppsala, June 2020

(5)

List of Abbreviations

ATR – Automatic Target Recognition ROI – Region of Interest

BVTO – Bias-Variance Trade-Off CNN – Convolutional Neural Network EO – Earth Observations

ESA – European Space Agency

ESIA – Guidelines for Environmental and Social Impact Assessment FIS – Feature Information Service

FN – False Negatives FP – False Positives

GIS – Geographical Information System GWC – Global Watch Center

HLC – High-Level Classifier IC – Information Class

ICD – Information Class of Dumpsites IC¬D – Information Class of Not Dumpsites KDE – Kernel Density Estimate

KLD – Kullback-Leibler Divergence LLC – Low-Level Classifier

NDVI – Vegetation Index

NEMA – Kenya’s National Environment Management Authority NIR – Near Infra-Red

OCSVM – One-Class Support Vector Machine OOI – Object of Interest

PDF – Probability Density Function RBF – Radial Basis Function RGB – Red, Green and Blue

SDG – Sustainable Development Goals SVA – Stacked Vector Approach SVM – Support Vector Machine SWIR – Short-Wave Infra-Red TN – True Negatives

TTRB – Top Three Rated Bands TP – True Positives

UN – United Nations

WCS- Web Coverage Service

YOLO – You Only Look Once (Model)

(6)

1 Introduction

This master thesis aims, in an automatic and scalable way to detect and classify dumpsites in Kampala, the capital of Uganda, by using freely available satellite imagery. The technical approach is based on using a Featured-based classification model, similar to what Automatic Target Recognition (ATR) do – by using the image analysis technique Feature Selection and Machine Learning classification. The motivation of this thesis, is to spur interest to be able to come closer to a realization of the Global Watch Center (GWC) initiative, by showing technical feasibility through using existing techniques and available tools in a non- commercial way. GWC is driven by the vision of helping all life on Earth by supporting the work of the United Nations’ (UN) Sustainable Development Goals (SDG).

1.1 Background

Since the industrial revolution, mankind has evolved and societies of the world has pros- pered. As a consequence of mankind’s rapid development, Earth’s climate transforms faster now than it has in the past million years. If humans continue to live as we do now, will not only accelerate Earth’s rapid transformation but also make all life on Earth difficult. Ad- mittedly, how we choose to live – whether we regard or disregard nature, Earth or climate change – does not affect Earth’s chance of survival, but our own. In addition to the acceler- ating development, we have – with new technologies – greater opportunities to collect data on how humans are affecting the environment. This data can by being put into context be transformed into information. This information is believed to further help demonstrating what concrete actions that needs to be taken, in order to increase our chances of keeping Earth as our common home.

1.1.1 The Initiative

Overall, there is a great demand, with divergent interests, to map out Earth’s ongoing environmental changes. GWC was initiated with the purpose of enhancing global situational awareness of the premises for all life on Earth. The vast majority of Earth Observations (EO), are collected by using satellites in low orbit. By collecting, studying and analyzing EO together with accessible in-situ data, the hopes are to be able to follow up the work towards reaching the UN SDGs. The hopes are also to look for and identify examples of illegal exploitation of natural resources, monitor sustainable development and obtain useful information about natural disasters. The GWC vision (see Figure 1) is to make use of qualified accessible data together with worlds’ leading organizations in order to lay the foundations of the important decisions that have the biggest potential to make an actual difference for our awaited future [1].

GWC is currently in a phase of pre-study. With the ambition to realize the initiative there will be a demand for collaboration between many organizations and actors. To spur a general interest for a realization and to reach out with the potential of GWC, it is at this stage important to show what could be accomplished by transforming data into concrete information. The GWC pre-study have defined several use-cases [1].

As a show-case, the UN Global Pulse – which is UN Secretary-General’s initiative on big data and artificial intelligence for development, humanitarian action, and peace¹– has discussed with GWC regarding a specific use-case [2], involving mapping large accumulation

1https://www.unglobalpulse.org/

(9)

Figure 1 Illustration of the GWC’s overall mission [1]

of waste in areas greatly affected of non-existing solid waste management systems. Accord- ing to the Africa Solid Waste Management Data Book published 2019 [3], the population has grown faster in parts of Sub-Saharan Africa than in any other parts of the world increasing with as much as 150 percent from year 2000 until the year 2015. At the same time, an increased urbanization has led to an increased demand of solid waste management of 170 percent. In spite of this, some governments are still unable to provide collection of solid waste, which leads to many negative consequences [3].

1.1.2 The Local Situation

Guidelines for Environmental and Social Impact Assessment (ESIA) from 2018, involving among other organizations the National Environment Management Authority (NEMA), in Kenya emphasize that because of the rapid urbanization and high populations solid waste has posed severe social, economic and environmental challenges [4].

According to Gustafsson [5], who has been working with Kenya’s government towards better handling of solid waste, there exists many locations where the situation is particu- larly critical due to the extensive accumulation of waste. This results in burning dumpsites in order to save the nearby villages from drowning in waste. The smoke spreads methane, toxins and other hazardous matter by air and water to close-by local areas. Not only does this negatively affect the health of people living close to dumpsites, but also the local and global environment. Gustafsson stresses the fact that there is a need of knowing where the sites are located, what their sizes are, how many people living close-by that are directly affected and how the hazards are being spread by winds and moving water. With this knowledge, it would be possible to allocate limited resources and prioritize in an informed way to relieve as many directly suffering people as possible. Although many successful efforts have been made when it comes to infrastructure like roads and sewage systems, this is not the case for solid waste management. At this point, the situation is a national concern in Kenya

(10)

and the aims of the government’s work is to increase the awareness of the importance of approaching a solution to the problems that large accumulation of waste leads to [5].

When it comes to solid waste management, structured data and information is important in order to make decisions. Some (22) cities that has responded to a web questionnaire that gathering of data is performed at least once a year. The data that is collected is mainly about the volumes of waste generated, collected and disposed measured by weight scales.

Interestingly so, the collected data indicate that only half of the cities that have responded actually have weight scales. Interviews have concluded that data in some cases are collected by an estimate of the volumes using an observed amount of collection trucks. This makes the data somewhat unreliable. Data management will according to the Databook [3] contribute to a proper management of solid waste and raise awareness about the situation in hope for future solutions. The Databook also states that information technology, and other already effective tools used in developed countries, can help enable countries in need of organizing solid waste management rapidly and at a relatively low cost [3].

1.1.3 Motivation and Early Hypotheses

The consequences of insufficient solid waste management both affects people locally and the environment. In spite of demands to map out Earth’s environmental changes, mapping out where large accumulation of solid waste is located does not seem to spur many commercial interests [6]. By using existing and freely available sources of data, techniques and tools – it is believed to be possible to help in knowing more precisely how their lack of solid waste management affects people and environment. Since many dumpsites are illegal and unauthorized [7], being able to locate the dumpsites would serve as a stepping stone in order to be able to in an informed way allocate resources. Importantly, the aim with this show- case is to raise awareness, attract interest and show the initiative’s feasibility. Therefore, this use-case is a perfect show-case to communicate the vision of GWC [6].

For this study, some hypotheses regarding characteristics of dumpsites have been collected. These are:

1. rapid urbanization growth or greater population does in general mean that more waste is generated [3],

2. larger dumpsites are often located in urban areas or in the outskirts of cities because of provisional waste management,

3. dumpsites with greater surface areas spread more hazardous material in streams by rainfalls,

4. dumpsites with a greater depth does not soak as easily and is spread less by rainfalls compared to shallow ones,

5. dumpsites with greater volume of turnover is more frequently burnt resulting in a corresponding spread of hazardous materials,

6. the humidity and water content decreases because of burning,

7. the surface temperature is increased because of the burning and lack of vegetation, 8. transmission of methane gas is increased because of decomposition,

9. the color is lighter because of the amount of ash as a bi-product of burning [5] and 10. the dumpsites consists mostly of organic materials [7].

(11)

1.2 Research Definition

The purpose is to demonstrate the GWC initiative technical feasibility through a non-commercial proof-of-concept of how to in a scalable and automatic way map out large-scale accumulations of solid waste by using existing and modern techniques of data handling together with satellite data. Although many sources of satellite data exist, the project’s non-commercial nature motivates using only freely available data.

Naturally, the amount of geographical data is vast, leading to that the data must be limited to a geographical area as an Region of Interest (ROI). Therefore, the aim is to show feasibility for an ROI. In order promote future potential of the technical solution, scalability is an important factor when the geographical limitations are none. For that reason, the technical solution will not have scalability constrains but will aim to limit the computational cost.

The first part of this thesis will aim to retrieve, pre-process and gain results for Feature Selection. The idea will be to limit the data to what provides the model with information.

Secondly, a classification of the chosen features will be accounted and validated for.

This thesis will provide and contribute technical remarks and insights. In addition, GWC wishes that information and knowledge from a successful proof-of-concept is presented in an accessible and easily understandable way. Therefore, there will also be a section on how the information in this use-case is chosen to be presented. The objective is to visualize this so, that regardless of a person’s technical background, it will be possible to draw the same conclusions as this study will.

In summary, this study’s objectives will aim to answer:

1. What are the image signature of solid waste from an aerial perspective from available satellite data?

2. How are features to be classified in an automatic and scalable way?

3. How are the results and conclusions made comprehensible, regardless of technical background?

1.3 Disposition

Firstly, related work will be presented for what is performed related to this use case, con- tinued by a description of some relevant techniques.

Analogous with the study’s objectives, the work is divided into three distinct subtasks and positively indexed in the same fashion as Python, by starting with zero. Subtask 0, Sub- task 1 and Subtask 2 will be examined in the method and with result in the same chronological order.

The proposed methodology will respectively combine the techniques from the fields of image analysis and data mining. These two fields will have separate sections in the theory.

In order to contribute to the GWC initiative, there will also be a complementary study of how to bring forward the insights to support the realization. In other words, to shine a light on how technical feasibility will be digitally presented, which together with testing will make up Subtask 2.

(12)

1.4 Related Work

Firstly, this section will go into some of the work that is closely related to this particular use-case of modelling and mapping out solid waste in the local region. Secondly, the section will continue into techniques more related to the experiments and techniques conducted in this study.

1.4.1 Study Reports

In some parts of Sub-Saharan Africa, ESIA are currently being used in order to understand waste flows. EISA states in their report [4][8] that the first step towards achieving a solid waste management is to understand the flows of solid waste. With data it is possible to estimate demands of containers, vehicles, facilities and sites of disposal for handling the waste.

Main sources of data involves surveys of households, analysis of resources and consultation with authorities and experts [4]. Another EISA report [8] carries out environmental and social screening by collected observations at the sites, interviews of households, informants, mappings and photographs. The amount of municipal waste from the sources are estimated rather than registered by direct measures. These estimates are typically calculated by data from surveys on the amount that every household generate. Questionnaire responses from households and data from weighbridges at the existing facilities are then used to create models of the solid waste flow and in extension to estimate the total amount of waste [4][8].

The Databook [3] highlight that governments in developed countries estimate their workflows and total amount of waste accurately because these countries have quantitative data available to a much greater extent. As a consequence of massive amounts of waste never entering formal waste service flows, because of burning or illegal dumping, no data can be used for an accurate estimate. Furthermore, at most dumpsites no weighbridges are installed. Surveying households is an important quantitative way of estimating the amount of waste generated. Nevertheless these surveys are not conducted systematically nor con- tinuously.

1.4.2 Geographical Information System (GIS)

Geographical Information System (GIS) is a framework that aims to retrieve, present and draw conclusions of spatial data. Today, many applications exist, and many studies in this field have been published. A study by Kinobe et al. [7] from 2015 presents a mapping of the waste in Kampala using GIS mapping software. The data being used were records from people and companies observing the sites and knowing the locations as well as secondary sources of data found in journals, documents, books and reports.

1.4.3 Earth Observations (EO) Analysis

EO analysis, Remote Sensing and monitoring of land use and change has been around since the 1970s. Technologies that are being used for identifying different land cover types have gone from using simple unsupervised approaches to various much more complex supervised classification algorithms. The studies range from regional to continental [9]. Aerial and satellite imagery have become omnipresent with many sources and diverse applications.

According to Itti et al. [10] there are so many sources of satellite imagery accessible today that it is impossible for image analysts to examine them all.

Most of the sources of satellite data come with a hefty cost, although some sources provide freely accessible images for research in the form of a collections available via APIs

(13)

or similar methods. A couple of examples that have freely accessible data usage are Google Earth Engine², Planet³, Airbus⁴, European Space Agency (ESA)⁵ and Copernicus⁶. What usage or access a researcher is granted depends on many factors. Available sources and how they differ, will be discussed further in Section 3.3.

There is a general interest in what is happening on our Earth. When browsing pub- lications related to image analysis and information extraction from EO, it does not take long before realizing that there is a vast amount of research being performed in this field.

Because of the vast amount of data and interest in knowing what happens – as well as understanding our Earth – there is a need to automatically process and extract information.

Since this is the case for this study, the focus in the next part of the section will shortly present what models that are considered to be applicable for modelling Remote Sensing applications.

1.4.4 Machine Learning

Today, Machine Learning is widely used for automatic learning of features from training sets. Convolutional Neural Networks (CNN) is one type that significantly do well for applications using image data. Wu et al. [11] propose a object detection framework partly based on CNN. More precisely, after detection the candidate object proposals is sent to a CNN for Feature Extraction and classification. In practice there are both many applications and varying types of CNN that exists. CNN is in general computationally demanding for training and inference. This has resulted in that there today exists types that distinguish themselves by being less computationally demanding while still having about the same performance in accuracy. Examples of that are Fast CNN, R-CNN and YOLO. Because of the need of training the complex models, there is a need of a large training data set [12].

1.4.5 Automatic Target Recognition (ATR)

ATR is a solution that tackles the time complexity of analysing large amount of data while using both image analysis and classification techniques in different segments of the analysis.

According to El-Darymli et al. [12] the method originates from military applications but is today also a technique of importance when it comes to consumer or civilian applications.

The different components consist of detection, segmentation, classification and tracking.

Itti et al. [10] aims to automatically detect different types of targets with big variability from satellite images using what is called Saliency and Gist Feature Extraction to detect and recognize ships. This is performed by using satellite imagery with high resolution which is divided into many small grids of images which are to be analysed on features and interaction across the respective grid spaces. The analysis combine statistical signatures of the targets and the model is used to classify if a potential target is a ship of not. Support Vector Machines (SVM) are used for large image data sets and with a proposed algorithm that outperform State-of-the-Art solutions like HMAX or SIFT [10].

2https://earthengine.google.com/ Accessed 2020-05-20 3https://www.planet.com/ Accessed 2020-05-21

4https://www.airbus.com/space/earth-observation.html Accessed 2020-05-21 5https://www.esa.int/ Accessed 2020-05-22

6https://www.copernicus.eu/en/services/land Accessed 2020-05-20

(14)

2 Theory

The theory section will aim to justify the methods being used in three different parts in chronological order of the workflow. First, there is a section for analyzing the features with techniques and theories found in the area of image analysis. Mainly, the theory will be discussed from the perspective of Remote Sensing applications and the scope will be close to what will be applied. The second section will motivate and explain the task of classification, the model SVM, its functionally dependent factors and parameters as well as some methods and metrics for evaluation. Lastly, the final section will lay the theoretical initial ground for complementing the study’s goals for promoting the solution’s possible future work.

2.1 Digital Image Analysis

Computerized image analysis is a field in computer science that aims to extract information from digital images by using image processing techniques [13]. Using Feature Selection is an important technique for large amounts of data for applications in Remote Sensing and will be explained.

Remote Sensing from satellite imagery uses sensors that are mounted on satellites to measure the emanating energy from Earth’s surface. By doing so it is possible to construct an image of the landscape from a great distance. As an example, the energy can be reflected light from the sun. Including the wavelengths humans can see, satellites can often measure outside of human range of vision. Exactly which span of wavelengths that are available depends on which instruments or sensors a satellite is equipped with. Image data is defined by the number and location of the spectral measurements and is referred to as spectral bands [13]. Image data acquired by sensors from satellites are in digital formats composed spatially of discrete pixels [14].

When for example dealing with images in grey, an image on a grid L is defined as having points (x,y), which are pixels. Each pixel (x,y) has values of intensity, I(x,y) [13]. From the example, a low value can correspond to a light grey color and a high value dark gray or black color. In an image, there can exist many Objects of Interests (OOI). In the occasion of an image containing one or more OOI, the image is referred to as a scene [15][13].

2.1.1 Different Resolutions of Image Data

The resolution of image data can be divided into four different types – the spatial, the spectral, the temporal and the radiometric. The spatial resolution indicates the size of the small- est object that can be resolved by the sensor. Usually, this is thought of as the linear measure of how much of the ground is represented by one pixel. The spectral resolution is a measure of how wide the sensors bands are in wavelength. A sensor can be panchromatic, which means that the sensor have a single wide band for the bands visible to humans [9].

Multi-spectral bands signifies that a sensor can access bands not visible to humans like Near-Infra-Red (NIR) or the thermal-IR-spectrum. In most cases of systems having hyper-spectral bands, the system has multiple narrow spectral bands. Temporal resolution signifies the measure of the cycle in which the sensor revisits the same surface on Earth.

Lastly, the radiometric resolution highlights the number of the different output numbers in each band of the data. This is determined by the amount of bits the data is recorded in. As an example, if the data is in 8-bit, then the output number can range between 0 and 255 for every pixel, since2⁸ = 256. The greater the amount of bits, the higher the accuracy of the radiometric resolution is [9].

(15)

Figure 2 Pattern recognition in two dimensions. On the axes, spectral reflectance or bands’ characteristics can be observed. The labelling corresponds to Information Classes (IC) [14].

2.1.2 Multi-spectral Space, Spectral Classes and Information Classes (IC)

One way to represent multi-spectral data is to plot the intensities in a multi-spectral vector space. For every measured pixel there will be a data point in the space corresponding to that pixel [14][13].

It can be helpful to know which data that signifies what is being analysed and what information class (IC ) is aimed to be found. In Figure 2, it can be observed how, in a intuitive way, information empirically can be deduced about the objects that are aimed to be examined and how they differ. The groups of dots form what is referred to as clusters corresponding to different ground cover types. Systematic noise and topographic effect are two sources of variation in the data. A good discrimination can be constructed like the to the right of Figure 2. However, using this technique of pattern recognition in practice, one can not expect the result to be as depicted in the figure. The same type of ground cover can be found in multiple clusters or clusters can be overlapping. With a greater amount of spectral bands available, there is a greater chance of being able finding clear clusters of spectral classes. Having bands or channels available, it is necessary to determine the appropriate subset of bands that represent the information classes effectively. Working with statistical models and classification, spectral classes can be seen to have unique probability distributions while information classes will be seen as having multi-modal probability distributions [14][13].

2.1.3 Features and Feature Selection

According to Liu and Motoda [16], features can be called characteristics, attributes or properties. A feature can be real or complex and continuous or discrete. A discrete feature has limited number of values that can also be divided into the types of ordinal or nominal.

Ordinal values can be ordered in one or two directions, while a nominal value does not have an order [16].

When it comes to the field of data mining, features can be described as relevant, irrelevant or redundant [16]. According to Li et al. [17], data can contain a lot of redundant, relevant, irrelevant and noisy features. As an example, observing Figure 3 to the right,f 1 can be seen as relevant since it is possible to discriminate the two classes by the feature.

On the contrary, canf 3not be used for discrimination. To the left, it is possible to see that f 2is correlated tof 1, meaning that the feature is redundant [17].

(16)

Figure 3 To the right, feature f1 is relevant and f3 irrelevant. To the left is f2 a redundant feature to f1 [17].

Analysis of great amounts of data can be computationally expensive. Therefore, Feature Selection is included as a means of reducing the complexity or computational load. In brief, Feature Selection is about reducing the bands that are used for analysis. This can help in lowering the amount of needed computations significantly. In this process, bands that do not aid for the purposed analysis are removed. This cannot be performed indiscriminately and must be assessed in a rigorous and quantitative way. The most common way to do so is to determine the separability between the classes mathematically by calculating the separability of the classes remains when the features are reduced. In other words, if a removal of a feature does not affect the separability of the classes, then the feature is considered not to aid the separability in the analysis [14]. An example of this could be to remove feature f 3 in the Figure 3. The removal will not affect the discrimination of the two classes but spare the cost of calculations as well as improve the curse of dimensionality [17].

According to Li et al. [17], has since the mid-1990s hundreds of algorithms for feature selection of different types been proposed. These methods can generally be catego- rized as information-theoretical-, similarity-, sparse-learning- and statistical-based methods.

The information-theoretical-based methods are many but generally signifies themselves as a type of method that exploits the different heuristic filters and their respective importance.

Their performance as in how much a singular feature is important is based on maximizing the feature’s relevance and minimizing the feature’s redundancy. In the foreground is the concept of entropy important. The maximum entropy of the distribution of data, is the best representation of the states knowledge or in other words possibly the most important feature of the data. For a discrete random variableX, the entropy is defined as [17]:

H(x) = − X

x_i∈X

P (xi)log(P (xi)), (1)

wherexi is the value of a random variableX.

The entropy can help in retrieving statistical information about data distributions, which will further be examined in the next Section 2.1.4.

(17)

2.1.4 Data Representations and Separability

In the event of having large amounts of data, it can naturally be difficult to see a void in data representations or to know a priori which features offers the best description of a data set [14]. Studying data trends or distributions can help in extracting information. Creating histograms or estimating a Probability Density Function (PDF) can be useful for estimating the general distributions of data [13][18].

According to Rudemo [18], with the set of samplesX = x1, x2, ..., xn that are to identi- cally distributed real variables, isfˆthe notation of the estimator of the probability density, I = Ik are partitions of disjoint intervals, hk is the length of Ik and Nk is the number of observations. With these notations, the histogram can be defined as:

f (x) = ˆˆ f_I(x, X) = N_k nhk

, x ∈ I_k (2)

Estimating distributions can also be made with different types of interpolations. Suc- cessful methods are Kernel methods. These types of algorithms are a standard tool in pattern recognition and machine learning today. Kernel methods are considered to be solid mathematical frameworks that fit the uses of Remote Sensing data processing, are easy to use, fast in calculation, robust to higher dimensions as well as issues with low numbers of labelled samples. Since this often is the case for Remote Sensing applications, Kernel methods are useful frameworks [19].

A non-parametric way to estimate a distribution is by using the method Kernel Density Estimation (KDE) in order to obtain a PDF. Fundamentally this method is used as a solution for data smoothing based on a finite data sample [20] [21].

The KDE is a fairly simple way to compute the average density of training data. Accord- ing to Rudemo [18], with the same notation as for the definition of the histogram and with a real non-negative functionKwith the kernelR K(x)dx = 1,h > 0andα = (K, h), then the kernel estimator is:

f (x) = ˆˆ f (x, X) = 1 nh

n

X

i=1

K x − X_i h

, (3)

The parameterhis more commonly referred to as the bandwidth parameter and most commonly kernel is the Gaussian [14], which is defined as:

k(z) = 1

√

2πexp(−¹₂z²). (4)

When looking at how to differentiate twoICfrom each other, a useful empirical method is to measure how similar the data of the two classes are. By measuring separability of two subsets in differentIC and by using single dimensional distributions it is made possible to simply observe how well a feature can help in discriminating the classes from each other.

Observing Figure 4, if there is a big overlap for the two subsets distributions, then it can be assumed that the feature does not help in the task of classification analysis. On the contrary, if the opposite is small, the chances of using that single dimension or feature is less likely to make an error in discriminating [14].

When having two hyper-spectral data sets, there are different ways of determining how to quantify the separation. The difference between means can be insufficient since the standard deviation of the distributions or the type of distribution itself can vary. In order to obtain a good approximate of the separability, a vector-based measure is useful. Sev- eral measures that are vector-based methods exists [14]. The same goes for measuring the

(18)

Figure 4 Two-dimensional multi-spectral space showing a hypothetical degree of separation possible in a single dimensional subspace [14].

distance or similarity between two distributions, with examples like the Kullback-Leibler Divergence (KLD), the Bhattacharya Distance or by estimating various types of correlation coefficients. The KLD is robust and has its advantage of being able to be applied to any types of distributions, since data distributions usually are too complex to be assumed to be modelled by a single Gaussian [22]. KLD is also called relative entropy and is asymmetric in its solution, measuring how great a probability distribution is different from a second probability distribution. KLD does not measure the variation of information but how identical the distributions are [23][24], and is defined:

DK L(P ||Q) = X

x∈X

P (x)log(P (x)

Q(x)) (5)

whereP andQare defined on the same probability spaceX.

The measure is non-negative. If the measure is close zero, then the distributions are identical and in case of an increased measure then the distributions are increasingly different from each other.

2.2 Classification Model

A classification model generally aims to draw conclusions from values or predict possible outcomes. The different choices of techniques or models are many. In this section, the focus will be on how to in a way similar of how the ATR technique can do, classify image data using a classification model and choosing a specific model.

2.2.1 Automatic Target Recognition (ATR)

According to Amit [15], detection can be viewed as a classification between two classes, object or not object. Associated with the technique of ATR for military applications, the target is what our OOI is referred to in the scene, while clutter are the other objects that the model is not interested in. Noise are results of imperfections in the process of the classification problem, such as measurements inaccuracies from sensors [15].

In the process of reducing a scene into clutter and potential targets, the detector of the OOI is comprised of two processes of analysis [15]. The first one can be referred to as a Low-Lever Classifier (LLC) that functions as an intermediate pre-screener that handles

(19)

Figure 5 The left figure represents an example of how models can be characterized by how much a model is Feature-based. The right figure shows the general approach for constructing a Feature-based model classifying SAR data [12].

the input data in a computationally efficient and effective way to eliminate obvious clutter and pass forward potential OOI in an ROI. In this stage, reduction of the dimensionality of the data can be made and the LLC ought to be designed to balance the trade-off between the efficiency of detection, computation complexity and rejection of outliers. This trade- off is important in order for a pre-trained model to work in or near real-time while having a low probability of false positives or false alarms. The second part of the process uses the potential or candidate targets for classification in a model that can be referred to as a High-Level Classifier (HLC).

The features could play an important part of ATR. In order to be able to separate the target from the clutter, it can be important to know how the targets features are different from the clutter [12]. When differentiating models from each other, one way is to emphasize how much the model is based on features or not. The LLC is generally seen to be Feature- based and often implemented as a one-class classifier solely relying on feature vectors. A HLC is more complex using a knowledge-based approach and are seen to be Model-based.

As illustrated in Figure 5 to the left, the types typically affects the computational complexity and recognition accuracy. To the right in the same figure the proposed work cycle of the Feature-based model can be found [15].

2.2.2 Choosing a Classifier

Classification tasks can be solved in many different ways with many already existing models or technical solutions. Different techniques have different advantages and respective disad- vantages. What the proposed task aims to gain is important to understand when choosing a suitable technique. Nevertheless, the most important factor is the data which is being analysed [14].

Spatial classifiers recognize spectral properties in an image, pixels adjacent relationships and are useful when the OOI are large compared to the spatial resolution or pixel size.

As an example, when classifying agricultural areas, if a pixel is classified as wheat, there is a great chance that the next pixel also should be properly classified as wheat. These classifiers exploit a conventional image analysis technique called neighbourhood analysis.

These types of classifiers can also be useful in examples where the data is noisy, there is a

(20)

difficulty labeling individual pixels and the aim of the classifier is to construct a map of thematic properties. How well the classification can perform depends much on the comparison of the sensors spatial resolutions compared to the size of the OOI. A poor spatial resolution limits the possibility of analysing small OOI. In contrast of the example of classifying an agricultural OOI, the same sensor could have difficulty classifying urban areas. As long as the spatial resolution is arbitrarily good, using structural information like neighbourhood analysis or calculating texture could be useful [14].

Classification algorithms using statistics in order to label data are the most encoun- tered techniques in the area of Remote Sensing. The use of likelihoods is a valuable aspect when labelling a pixel after the most likely IC. Most common is classification made by se- lecting the label with maximum likelihood. The remaining possibilities can, depending on the case, give insight or possibly more information. Statistically discriminant functions can allow a pixel to be classified depending on its position in multi-spectral space [14]. The development of advanced tools have contributed to Remote Sensing in the handling of larger volumes of multi-spectral and -temporal data. With higher feature spaces, the candidates of appropriate models for the classification task is limited. Many State-of-the-Art Model-based techniques exists like Neural Networks or Bayesian Classification which performs very well for classification. These do in general need a high number of available samples for training and are not great in high dimensions of feature space [25][26]. Some classifiers like maximum likelihood is insufficient for working with high dimensional data where assump- tions are made that the data is normally distributed [27]. Random Forest and kernel based learning methods (like SVM) have demonstrated suitable for operating in a larger feature space and with limited samples to train from [27]. Compared to multi-layer perceptron, Ra- dial Basis Function Neural Networks, Polynomial Networks or other existing algorithms is SVMs considered to be highly efficient when it comes to performance in the inference stage (relative the computation) [28].

2.2.3 Support Vector Machines (SVM)

SVM is a kernel method that through studies has proven itself to outperform other techniques regarding accuracy and regardless of independent data distributions and large volumes [27]. SVM are especially suitable for classification and regression tasks in a larger feature space. With grand simplification, the method approach is to separate the feature space (see Figure 6) with a surface that best describes where classes are separated from each other [25]. The SVM lower the expected error by learning and minimize over-fitting.

The approach results in a binary classifier with a maximized geometric margin that min- imizes the empirical classification error. It is a superior and efficient machine learning method that delivers generalization in a robust way for predictions [29].

Classification of images are in many cases used for analysing multiple classes [31]. Even so, sometimes the aim of a model is to classify just a few classes. In these scenarios, SVM has played an important role with its many extensions, one example being the one-class SVM (OCSVM) [19].

SVM is a statistical learning method, based on an amount of samples that the model is trained from and is able to recognize complex patterns [28]. It is a kernel method that has found wide applications in the fields of science and engineering. It is a supervised learning method that aims to separate the classes in a maximal way, therefore the SVMs are sometimes known as a Maximum Margin classifier. The SVM finds the hyper-plane (a subspace whose dimension is one less than the feature space) with the most maximized margin for separating the classes in the feature space. After training the model, the hyper-

(21)

Figure 6 The effect of the transformation of the input data into feature space [30].

plane is then defined by points on the margin as vector in the space, these are called support vectors. When it comes to Remote Sensing, the SVMs constitutes a main classification algorithm. In general the SVMs performs excellent in high dimensional feature spaces and relatively well in regards of robustness with low numbers of samples available for training.

These mentioned model advantages are among other reasons why this model is widespread in the area of Remote Sensing [19].

Using SVMs consist of a process with two main parts. Firstly, the training is conducted using an appropriate training data set and secondly the model is then tested using new input images [28]. While using SVM for image classification, one possibility is to assign the pixels in a scene to the classes. In this instance it is required to know the classes that are in the scene. When using an OCSVM the knowledge about the one class of interest is sufficient in order to construct the task of classification [31]. In short, the OCSVM then maps the data into the high-dimensional feature space along with the kernel function in for iteratively finding the hyper-plane that lies on the maximal margin that separates the one class from the other unknown classes [32]. The equation of the hyper-plane can be observed in Figure 7.

In more detail, with a training set ofipixels innvectors in the form( ~x1, y1)...( ~xn, yn), wherewk are the weighting coefficients, then the two information classes (IC1 and IC2) are defined by:

w.x + wN +1 ≥ 1, forIC1pixels (6)

w.x + wN +1 ≤ 1, forIC2pixels (7) which for the respective classy_i can take values as +1 and -1. Then the equation 6 and 7 can be written as an expression valid for any pixel, without regards of correspondingIC, such as

(w.x + w_{N +1})y_i≥ 1, for pixel i in its correct class. (8) Which for future substitution also can be written as

f (xi) = (w.x + wN +1)yi− 1 (9) If all the data points in our instance are to be linearly separable, then the two marginal hyper-planes (that can be observed in Figure 7 and defined by the Equations 6 and 7) can be found. By then using these margins it is possible to find the maximal margin or the

(22)

Figure 7 The maximal margin can be observed as the optimal hyper-plane of the closest pixels or data points and the equation of the respective classes are shown [14].

optimal hyper-plane. This is performed by using the normal form or the perpendicular distance (which results in _kwk² , where kwk is the Euclidean length of the weight vectors) between these margins [14].

Since the interest is to maximizing the margin and therefore minimizekwk, the points of the training set ought to be on the correct side of the maximal margin. The minimization is therefore being done by Lagrange multipliers which subtracts a proportion of each training points constraint. In this instance, the Lagrange multipliers and minimization ofkwkgives the equation

L = 1

2kwk²−X

i

∝if (xi) (10)

∝ibeing the proportion of each constraint andf (xi) ≥ 0.

MinimizingLand therefore finding the weights, it is possible to set up the equation of the hyper-plane. The function itself functions as a discriminant function and therefore if a point or pixel is in eitherIC depends on the output of

g(x) = sgn(w^◦.x + w_{N +1}) (11)

As long as the training data is linearly separable, the previous walk-through is still valid.

Naturally, this is not always the case. In these instances, transforming the pixels vector into a different feature space is beneficial for generating a valid discriminant function. By introducing a kernelk(xi.xj)in equation 11, the definition is:

g(x) = sgn{X

∝^◦_i y_ik(x_i, x_j) + w_{N +1}}, givenk(x_i, x_j) = φ(x_i).φ(x_j) (12) The dot product of the feature space transformation ofx,φ(x_i).φ(x_j), is then used to maxi- mizeLand to obtain values forg(x)in order to construct a classifier.

(23)

2.2.4 SVM Performance, Kernels and Evaluation

The data used for training a model and the hyper-parameters of the model affects the models performance directly. In the process of constructing a model that aims to perform as good as possible, the bias-variance-trade-off (BVTO) is important to take into account. This influences the effectiveness of generalization of a model in a profound way. Since the model aims to fit to the training data in a way to maximize the accuracy of the future data points to be classified (in opposite, perhaps, of the intuitive approach) is not the appropriate approach to fit the model to every point of the set in a perfect way. A hypothesis can be that the training set could consist of data points that are to be considered as noise or outliers. The term associated to fit a model too well to the training data is over-fitting while the opposite of missing trends of the training data is referred to as under-fitting.

BVTO is fundamental for model generalization and finding the right balance is impossible to know a priori. A strong prior model can gain bias, while a weak priori model can gain variance. Therefore the constrictor of a model needs to guess or empirically examine for how the model best is to be constructed for the classification task. Depending on the model, this matter can be accounted for in different ways. One important aspect that is seen as critical when modulating the BVTO is the complexity. The complexity can be thought of the degrees of freedom of a model, which is closely related to the setting of the hyper- parameters. The approach is to systematically and in a scheduled way obtain an optimal general accuracy [33].

Regularization is used to impose the priori information as a mathematical tool on the solution structure. It is an optimization task that help the model to fit the training data well by generalizing and avoiding over-fitting [34]. As mentioned, SVM are in general relatively good at generalization since the goal of the hyper-plane is to calculate a general curvature surface that separates the classes [26]. With that in mind, the SVM has a strong regularization properties. In other words, this means that it is important to set the parameters of the kernels properly in order to obtain a generalized model [35]. There exists several examples of different kernel functions. Out of the common examples are the linear, the polynomial and the Radial Basis Function (RBF) kernels [35]. There are no rule-of-thumbs available for which kernel is the most favorable to be used, although the most common kernel for Remote Sensing applications is the RBF [14] and there exists studies demonstrating the RBF kernel’s superiority for Remote Sensing applications [35]. Because of the functional- ity of the kernels, this also means – given well-chosen kernel parameters – that the model can guarantee performance and robustness even when the input to the model has some bias [26][36]. In short, aiming to explain the function of a kernel, the kernel introduces dimensions for projecting data in higher dimensions to enable more complex classification of challenging data sets. When having a one-dimensional data set, the linear kernel perform a two-dimensional classification trick projecting the data in a two-dimensional feature space.

When using a polynomial kernel, the output will depend on a directional output of vectors dependent on the inputs’ dot products. The most widely used RBF kernel adds a Gaussian distribution to each input data point. The RBF is thenk(xi, xj) = e^{−γ kx}ⁱ^−x^j^k² where the regularization parameter gammaγis chosen. The polynomial kernel is respectively defined k(x_i, x_j) = [(x_i).(x_j) + 1]^pwhere the order of the polynomialpis chosen [35].

For all common kernels, the parameter the kernel function depend on are the regularization parameterC. TheCparameter (stands for capacity) is a tuning parameter controlling the capacity of miss-classification errors with weights and therefore control the SVM’s ability to generalize. The value can be demonstrated to be linked to the width of the maximal margin, permitting classification errors [36]. If the value is high, then the miss-classification

(24)

Figure 8 In a confusion matrix, the true classifications are on the diagonal [37].

samples are given higher weights which leads to a lower generalization of the model. In the instance of a high value for C, the model can perform very well on the training process, but poorly on a new data points [36]. Therefore, the importance of avoiding over-fitting a model could for the first time appear when new data points are tested for [34]. For the RBF kernel, the regularization also depends on the parameterγwhile for the polynomial kernel the degree of polynomialpis central [30].

When training and validating the model, the training set is typically partitioned. How the partitions are made depends on the type of re-sampling method being used [12]. Two common examples are bootstrapping and cross-validation [37]. These two subsets of the training data are then referred to as the training set and the validation set. The training set is for the model to train on and the validation set is used for evaluating the performance of the model being used. The motive for using partitions is to make sure generalizations is made in at a good level in order to achieve an optimal classifier [12].

The performance of a model can sometimes be difficult to understand. Many modern techniques have limited transparency – unfortunately, this is also the case for SVMs. Since the dimensions of the hyper-plane is, by definition, one less dimension of the training data [30], it can be difficult to interpret results [26][36]. In these instances, there is a possibility of using linear approximations of a model’s performance through graphical visualizations represented in bi-dimensional graphs [36].

For model evaluation, wishing for excellent results in accuracy, the constructor of a model ought to address which kind of evaluation metrics that are relevant for the model.

In other words, there exists different measures of how accurate a model can be. Common ones are accuracy, sensitivity, specificity, precision and F₁ – with different mathematical definitions that is calculated from the values in a confusion matrix. The confusion matrix is a tool used for analysing how well a classification is made. The true positives (TP ) and false positives (FP ) are the results for when the model classifies as an object as positive.

The true and false stands for if the object actually is a member of the IC being positive. The same explanation goes for the negative true negatives (TN ) and false negatives (FN ). These values are what the confusion matrix consist of and can be observed to the right in Figure 8 [37].

Ideally, out of all the classifications (the sumT P + T N + F P + F N) the aim is to have the sum ofF P + F N equal or close to zero. By definition, the accuracy of a classifier is

accuracy= T P + T N

T P + T N + F P + F N ⁽¹³⁾

while precision is

precision= T P

T P + F P ⁽¹⁴⁾

which together with Recall is a widely used metrics for classification. Precision is thought

(25)

of as a measure where the interest lies to investigate the exactness but is seldom used on its own, since it does not investigate how much the model has mislabeled. Recall is used for understanding how many points that are incorrectly labelled, has an inverse relationship with precision and is therefore in many cases used together as the harmonic mean of precision and recall [37]. The harmonic mean can be referred to as theF 1-score measure and is defined by:

F 1-score= 2 ∗ precision ∗ recall

precision + recall ⁽¹⁵⁾

where recall is defined byrecall =_{T P +F N}^{T P} .

2.3 Presenting Data

Retrieving information for Remote Sensing applications are most often being made in a labo- rious way though trial-and-error. This can make it difficult for novice users of a system using complex data mining techniques, implementing new features and interacting for knowledge discovery [38].

Datcu and Seidel [38] proposes that a system should aim help user to incrementally aim to building knowledge in a cumulative way taking advantage of the knowledge that was learned before. It is important as a designer of the interface of such systems to separate the semantic abstraction in signal models and user ontology. The design should be presented in object components, contextual structures and scene components to the user [38].

The illustrations of a system ought to aim to make the user perceive the system in a sequential way and focus on the most important information that is rendered. The datails that do not provide context or information should not overload the user’s attention and is therefore de-emphasized or subjugated [39]. The user should have easy, interactive and fast access to information content of the illustrations and images. In this way – if a user wish to add values and evaluate the results – the system can be interpreted and used in a god manner [38]. By enabling the application to accommodate new scenarios through a demonstrator, new classes to discover can be found in user experiments of large amounts of data [38].

Datcu and Seidel’s approach to an interactive detection interface is:

1. select the data sets that should be used for tuning the models, 2. validation of the sites to be used for the models and procedures, 3. selection of the data from the different available sensors and 4. run classification.

Visual Storytelling is an approach to visualize spatio-temporal data for knowledge discovery in an interactive web-available design for users with diverse backgrounds and expertise. The designer of the interface creates a series of discrete exploration tasks so the user can select relevant regions of interest, indicators, color scheme, filtering conditions, temporal conditions with buttons and understandable key words [40].

Since multi-spectral data contribute information at an elemental level with energy in spatial and spectral measurements, the representation should offer perspective of inter- pixel relationships in its visualizations [41]. In addition, according to Rhyne [42], changes in data values can be presented in different analogous color schemes of blue, green and yellow. For visualizations, yellow is used for the dominant color and green and blue is chosen for emphasizing numeric change [42].

(26)

3 Data

This section will aim to give understanding of the predicament of using freely available satellite imagery data with an approach of using supervised learning to classify dumpsites.

This is, as understood so far, not to have been performed before. In this section, some theory together with methods and empiricism of accessible data extraction, will be introduced.

3.1 Constructing Training Data Set

Some key challenges exists for using supervised learning in the context of Remote Sensing.

In order to train models or adopt classification algorithms for the proposed tasks there is a need to collect big amounts of training data [19]. Accurately collecting ground truth in large quantities with a high level of quality is a difficult and expensive task [43]. Often can the data be difficult to interpret due to heterogeneous or complex geographical areas from an aerial perspective [19].

When training a model with supervised learning, prior identification of training pixels is essential. In the process of choosing a training set there is need of having a significant amount of data points from each desired class to accurately estimate the model. The degree of generalization of a model trained from a training data set can be greatly affected by the size of that particular set. If there are too few training examples, the model may over-fit and will not generalize well onto the test set [14]. According to John and Xiuping [14], when constructing a mean vector and co-variance matrix – for anN dimensional multi-spectral space – the amount of elements in the training data set ought to be0.5N (N + 1). In other words, if the multi-spectral dimensions are 10, then the training data net need to at least have (0.5 ∗ 10(10 + 1)) in example 55 independent samples. Another recommended amount of distinct training data set points is10N for every spectral bandsN used. If possible the amount should be as many as100N perIC [44]. When constructing the training set, there is an importance of referencing the data at the same point in time as the points are to be classified since variations that are temporal may arise [14].

Furthermore, there can be many unknown thematic classes in a data set – that are used for training – these are seen as an unknown IC. As a result, this further increases the complexity of training the model. Usually, from the perspective of operations, the training does not employ to describe all the land-cover typologies. Instead an unknown class that is not aimed to be identified by the model is introduced and training data exists as a subset for the known classes [43].

3.2 Enhancement and Correction Image Processing

Image data or a digital image is an object composed of different picture elements. In discrete and finite quantities, its intensity levels are represented as aD-dimensional input (D depending on the amount of dimensions of the bands) and width positions inx- andy-axis [13]. The processing systems that are used for applications are several. The most important types will be briefly explained.

Enhancement of the image quality is a process of correcting coherent noise or scan- correlated shifts. Systematic errors and detection failures are known to exist in the collection of satellite data. This is usually performed by using pass-filtering, transformations or simple along-line convolution. These enhancement can be divided into operating in the frequency or spatial domain [9].

Radiometric Calibration is used to convert the voltages or digital numbers the sensors

(27)

Figure 9 As resolution increase, the accessibility is reduced. On thex-axis, the sampling frequency can be observed [1].

collect. The numbers are in a scale of radiance or reflectance that is corrected for the reason that satellites’ sensors over time degrades. Geometric processing aims to represent the spatial properties of Earth’s surface. Variations of the platforms altitude, velocity, Earth’s rotation, curvature or displacement are some of the many sources of factors that can distort the geometric properties of the data [9]. Naturally, because of the sensors aerial perspective, data collected from satellites contain both atmospheric and surface information. Re- flectance values can therefore be removed or corrected by Atmospheric Correction. Clouds can largely affect the data to be inaccurate leading to that the technique Cloud-masks are today seen as a must and a high-level atmospheric product [9].

Because of influence of aerosols, clouds and other sources of inaccurate data, Multi- Temporal Composing Techniques has been an important asset to enable further analysis.

There exist different algorithms, one example is the Vegetation Index (NDVI) that is ob- tained by red and Near Infra-red (NIR) data to choose the maximum value of vegetation over a given range of time. This leads to an effective way of characterizing surface of vegetation and has been widely adopted [9].

3.3 Satellite Data

There exists a vast amount of EO data. Actively, there are currently around 2000 satellites in orbit. Out of those, 800 of them are used primarily for EO [1]. The satellites are constantly in orbit and gather information to their respective stations on Earth. Resolution, quality and frequency varies depending on source, which some will briefly be examined in the coming subsections.

According to the GWC pre-study [1], there are three tiers that signifies the data accessibility. Those are classified as military, commercial and free. The general trend is that the accessibility gets more restrictive proportionally to an improved quality, as can be observed in Figure 9.

Feature Selection and SVM through Satellite Imagery

Examensarbete 30 hp August 2020

A Scalable Approach for Detecting Dumpsites using Automatic Target Recognition with

Feature Selection and SVM through Satellite Imagery

Markus Skogsmo

Abstract

Detecting Dumpsites using ATR with Feature Selection and SVM through Satellite Imagery

Markus Skogsmo

Examinator: Elísabet Andrésdóttir

Ämnesgranskare: Ingela Nyström

Handledare: Björn Åkesson

Populärvetenskaplig sammanfattning

Acknowledgements

List of Abbreviations

Contents

1 Introduction

1.1 Background

1.2 Research Definition

1.3 Disposition

1.4 Related Work

2 Theory

2.1 Digital Image Analysis

2.2 Classification Model

2.3 Presenting Data

3 Data

3.1 Constructing Training Data Set

3.2 Enhancement and Correction Image Processing

3.3 Satellite Data