Feature extraction and cluster analysis of oil slicks using optical satellite data (Sentinel-2)

(1)

Feature extraction and cluster

analysis of oil slicks using optical

satellite data (Sentinel-2)

Martin Bergström

Master’s Thesis, 30 ECTS

Master of Science in Industrial Engineering and Management, 300 ECTS Autumn term 2020

(2)

FEATURE EXTRACTION AND CLUSTER ANALYSIS OF OIL SLICKS USING OPTICAL SATELLITE IMAGERY (SENTINEL-2)

Department of Mathematics and Mathematical Statistics Umeå University

Umeå, Sweden Supervisors:

Jun Yu, Umeå University

Øystein F. Skogvold, Kongsberg Satellite Services AS.

Examiner:

Oleg Seleznjev

(3)

Abstract

The accessibility and price of satellite data is moving in a favorable direction which allows for and encourages new development. This thesis is written at, and together with, Kongsberg Satellite Services AS (KSAT) in Tromsø, with the aim to investigate the possibility for oil slick characterization of a known oil leakage on sea surface through optical satellite data, specifically Sentinel-2. The current machine models are based upon syntethic aperture radar data which generates good detection basis but lacks the ability to describe the oil slick in other than spread. Successful inclusion of machine models on optical data would increase the frequency at which a specific body of water can be monitored for oil and, more importantly, enable more detailed description of the slick.

The satellite data was normalized with respect to clear water values to enable comparisons across different scenes. Image manipulations were made to reduce the periodic reflectance from waves which otherwise tend to result in striped clustering.

Several clustering algorithm were tested to investigate the presence of separate features in the images , such as: K-means, Hierarchical clustering and Gaussian Mixture Model (GMM). Hierarchical clustering algorithm turned out to be too computationally costly with for the capacity. K-means and particularly GMM produced valuable output which in spread were very similar to a manual output from an expert, which throughout the thesis was used as reference point for accuracy of the models. In addition to this both models showed promising indication for the relative thickness of oil within the slick. There are no in situ knowledge of the analyzed oil slicks, it is thus very difficult to accurately access the performance of the models in this regard.

The results from this thesis can be used as basis for further development in the field or aid to generate labeled images for predictive machine learning models.

Keys Words: Cluster analysis, oil slick, feature extraction, satellite imagery, Sentinel-2, Gaussian mixture model, sea surface oil.

(4)

stimulerar ny utveckling inom området. Den här uppsatsen är skriven hos, och för, Kongsberg Satellite Services AS (KSAT) i Tromsø, med målet att undersöka möjligheterna för att beskriva en känd oljefläck på havsytan genom optiska satellitdata, mer specifikt Sentinel-2. De flesta nuvarande datamodellerna baserar sig på syntetisk apertur radar data som kan generera pålitliga detektioner men saknar möjligheten att beskriva oljefläcken i annat än utbredning. Lyckad inkludering av modeller som baserar sig på optiska data skulle öka frekvensen som en viss vattenmassa kan undersökas för olja. Men mer intressant skulle detta kunna möjliggöra en mycket mer detaljerad beskrivning av oljefläcken.

Satellitdatat normaliserades mot reflexionsvärden av klart vatten från oljefläckens närområde för att möjliggöra jämförande mellan olika scener. Bildmanipulationer gjordes för att reducera vågors påverkan på reflexionen som annars resulterar i en randig klustring. För att undersöka förekomsten av separata särdrag i en oljefläck, närmare var det K-means, Hierarkisk klustring och Gaussian mixture modellering (GMM). För bildernas storlek visade sig att hierarkisk klustring var för krävande för att beräkna med den tillgängliga datakraften. K-means men framförallt GMM genererade värdefulla resultat som i utbredning var väldigt likt det från en manuell analys av en expert, något som användes som referensunderlag för att utvärdera precisionen av modellerna. Utöver detta generade båda modeller lovande indikationer av relativ tjocklek inom oljefläcken som skulle kunna användas för att planera en effektiv handlingsplan. Men de analyserade oljefläckarna var inte mätta på plats så det är svår att noga utvärdera modellernas prestation i detalj.

Resultatet av uppsatsen kan nyttjas för vidare forskning i området och assistera i arbetet att generera input för en avancerad prediktiv maskinlärningsmodell.

Nyckelord: Klusteranalys, särdragsextraktion, satellitbilder, Sentinel-2, oljefläck, olja på havsytan.

(5)

Acknowledgements

This master’s thesis has been realized at Kongsbergs Satellite Services As as the final element in the Master of Science program Industrial Engineering and Management at Umeå Universitet. I would like to direct a huge thanks to KSAT for putting together resources to enable me to do this research. I’ve had the opportunity to learn more than just the scope of this thesis at KSAT and have been included from day one. For this thesis I want to thank the Applied deep learning team for the existing tools that helped me greatly. In particular I want to thank Øystein F. Skogvold as supervisor and providing guidance in technical matters, also Hugo Isaksen for manual analysis of oil slicks that I could use as reference point when there were no correct answers. Finally, gratitude must be directed Jun Yu, Professor in Mathematical Statistics, for invaluable methodological and theoretical contributions throughout the thesis process as well as feedback on structure and content of the report.

(6)

1 Introduction

1.1 Company background

Figure 1: Example optical satellite imagery [1]

Kongsberg Satellite Services (KSAT) is one of the world’s leading commercial satellite center. It has a network of over 200 remotely controlled antennas worldwide spread over 23 sites. They provide favorable locations for downlink for satellites in polar, inclined and equatorial orbits and offer Earth Observation services with the fastest available delivery times [2]. Major services in maritime monitoring including oil slick detection, vessel detection, ice monitoring as well as elevation modeling, ground monitoring, land cover mapping with change detection and more are performed by KSAT and its partners.

1.2 Problem background

Figure 2: Example radar image showing an oil slick

(darker section) [3]

Rapid detection of floating oil is essential to deploy appropriate and effective emergency actions to mitigate negative effects optimally(see example in Figure 3, as spills can have serious environmental impact. Methods to detect oil spill through remote sensing (see Figure 1 for an example Sentinel-2 image) are well understood and play an increasingly important role in oil spill detection. But knowledge on extracting relevant and reliable features such as relative thickness, emulsification etc (called oil slick characterization), from these measurements are more limited. The most frequently used technique for detection is analysis of Satellite Aperture Radar (SAR) imagery (see Figure 2). SAR satellites travels in a dusk-dawn orbit providing imagery early

morning and late afternoon. Most optical satellites travels in orbit with constant sun angle of incidence which is close to noon. Enabling the use of more data sources, specifically optical satellite data, will increase the frequency at which a certain area can be analyzed. SAR data also provides limited information for other than geometric description for a detected oil slick.

(9)

1.2 Problem background 1 INTRODUCTION

Optical satellite imagery, on the other hand, has shown promising results for oil slick characterization. Oil and oil/water mix have been shown to have absorption features distinct from water and clouds [4]. Lee et al.[4] proposed algorithms to distinguish film-like oil thick oil using Landsat Operational Land imagery and DubaiSat-2. Taravat and Del Fraite [5] performed ratio operations to LANDSAT ETM+ band reflectance before using it as input in an multi-layer perceptron neural network for pixel-based supervised classification with promising results. The spectrum of wave length measurements in optical satellite imagery provides a good basis for feature discrimination.

Optical remote sensing imagery is the type that will be covered in this study.

There are several systems that deliver this type of data and the EU project, Copernicus Sentinel-2 mission administered by European Space Agency (ESA), provides this free of charge which is a great starting point. Two satellites provide the data resulting in a revisiting time of 5 days at the equator and 1-2 days at mid-latitudes[6]. This project will utilize the Level-1C product. It provides orthorectified Top-Of-Atmosphere reflectance, with sub-pixel multi spectral registration [7]. The reflectance is measured in 12 different bands, that each covers a smaller range of wave lengths, to capture and discriminate surface areas on earth.

Figure 3: Example of oil cleaning crew at sea [8]

Knowing the true features requires in situ measurements which is likely to be expensive, time consuming and unsafe, thus not a very frequent procedure. Worth mentioning is that a lot of effort is currently being made to collect these measurements by Norwegian Coastal Administration (NCA) and the Norwegian Clean Seas Association for Operating Companies (NOFO)[9] with additional personell from SINTEF, CIRFA and more. In situ measurements are not only characteristics but the information that oil has in fact been spilled, which can make a detection on satellite imagery confirmable.

This lack of "ground truth" leaves no other option than an unsupervised analysis approach. This thesis will use clustering methods to reliably extract earlier mentioned features of the oil slick as a complement to already existing oil slick detection methods and manual description practice.

(10)

1.3 Objective and Motivation

The purpose of this study is to compare several methods of analyzing optical remote sensing images from the Sentinel 2 satellites in the Copernicus Sentinel-2 mission in an effort to characterize detected oil slicks on sea surface. Since there is only geometric information confirmed in this case, unsupervised classification methods, mainly clustering analysis, will be utilized for oil slick characterization.

Investigative work can help find previously unknown patterns in the data and strengthen classification confidence by combining the results with current methods - Synthetic-Aperture Radar (SAR).

1.4 Task description

As layed out in the problem background, KSAT wishes to explore the possibilities of extracting additional information of already detected oil spills. The work will focus on the geographical regions

. Optical data is less used, specifically in automated models which might have a great potential for characterizing oil slicks. The research seems promising but most approaches require specific reflectance spectrum knowledge. The task for this work is to investigate the data and leave the interpretation to the last step.

1.5 Limitations

This study is conducted over approximately 5 months, by one master’s student at the civil engineering program Industrial Engineering and Management at Umeå University. This limits the amount of methods that can be investigated in full.

The data set is comprised products from the Norwegian coast and have certain characteristics. The performance of analyzed methods in this thesis may perform differently in other environments and circumstances.

Sentinel-2 L1C product provides 13 bands for specific wave lengths and thus the characteristics of oil slicks on water can be challenging to measure given that it would differentiate itself more when measured at custom wave lengths. The bands also have different resolutions, 10-,20- and 60m, which gives rise to sparse information in some wave lengths which obviously reduces clustering precision in detail. Finally oil on water normally spreads in patches but can get torn and deformed as waves, wind and currents affects it formation. The reflectance values for a pixel are thus an average in that bands’ resolution area.

(11)

1.6 Confidentiality 1 INTRODUCTION

Lastly, computation capacity might be a limiting factor as one tile of the L1C product contains over 120 thousand pixels.

1.6 Confidentiality

Between participating parties there have been signed a confidentially agreement consisting of an undertaking not to reveal any information of confidential or secret nature owned by KSAT. All information and results in this report have, before publishing, been approved by KSAT for common publishing.

1.7 Outline of the thesis

Section 1 provides the background to the problem and introduces the motivation for the suggested investigation. Further, it sets aim and defines scope of the thesis.

Section 2 will introduce and explain the structure of the data. Section 3 covers the major field specific- and mathematical theory used in the analysis. In Section 4 the methodology of the thesis is described with explanation for why the chosen approach. Further, in Section 5,6,7, the results, discussion and conclusion are presented, respectively. Thoughts and suggestions for future studies are also found in the Section 7.

(12)

2 Data

2.1 Data Structure

The data source going to be used is the Sentinel-2 L1C product from ESA, a free data source. It is downloaded straight from ESA servers using their own API. The earth’s surface is divided into granules(tiles) of ∼ 100×100km squared ortho-images in cartographic reference frame UTM/WGS84 (Universal Transverse Mercator / World Geodetic System 1984) projection. Each tile is then divided into pixels where the size is in accordance with that band’s resolution. For the L1C mode of this means that pixels are squares with sides of 10-60m. Each pixel is assigned a value between 0 and 10 000 to represent the reflectance in that wavelength. The measurements of wavelength are divided into 12 "bands" for Sentinel2 and defined as in Figure 4:

Figure 4: Sentinel-2 L1C band specification table. Band resolution on the left hand side and wave length(nm) measured along the bottom. [10]

Each band value is a mean of reflectance in the spectrum it measures. This translates to narrow bands being very specific while bands with broad measurement spectrum are significantly less specific. We can see that the product is not custom built to detect and analyse oil as the text in the circle highlights the specific purpose of a band. The satellite is more geared toward land observations but there is still a lot of information to gain in other environments. The y-axis shows the resolution for the bands.

(13)

2.2 Data visualization 2 DATA

If the tile is completely covered by the bypass of the satellite for a specific circulation we can expect the product to contain 10980x10980 ≈ 120million individual pixels, which translates to around ∼800MB of data. The oil spills are expected to be significantly smaller than so. But the size of the data set surely highlights the fact that computations can quickly become time consuming and hardware availability may very well pose some limitations.

Reflectance accuracy for the Sentinel-2 satellites is above 95% [11] and there are no missing values.

2.2 Data visualization

To familiarize the reader with the data, one representative scenery is chosen and its pairwise plot and correlation matrix can be seen below in Figure 5 and 6. As each variable is the reflectance in a specific wave length range, and these sections aren’t separated by much, strong correlations between neighboring bands are expected.

This gives rise to the question if all bands are necessary to include in models.

Figure 5: Pairwise plot for all bands with higher resolution than 60meters.

(14)

Figure 6: Correlation matrix

3 Theory

3.1 Oil characteristics

Several papers have studied the reflectance characteristics of oil on water, but a majority performed in a lab to enable controlled studies. Out on open water there is an addition of significant noise as wind, water type, water depth, water vegetation, algae and more distorts already subtle characteristics of a thin oil film. Clark et al.

[12] studied the spill in the Mexican Gulf in 2010 and suggested that we can expect reflectance from oil distinguish itself in 3 key sections in the reflectance spectrum, as presented in Figure 7:

(15)

3.1 Oil characteristics 3 THEORY

Figure 7: Oil spill reflectance spectrum from a spill in the Mexican gulf by Clark et al.

[12]

Work by Lennon et al. [13] on airborne sensors showed that Thermal IR (8µm-12µm) and hyperspectral VNIR (400-1000nm) sensors are complementary as far as detection capabilities are concerned but thin oil slicks can not be detected by thermal IR sensor because of the sea surface thermal balance. Detecting thin oil is thus reserved to shorter wave lengths. Although these longer wave lengths struggle to discriminate thin oil from water, thick oil influence the signal strongly. The Fluo Lidar sensor, 300-700nm, was found to be valuable in thickness estimations for regions thinner than 100µm.

This work on oil on water spectrum can serve as basis for which bands to be included from the L1C product in our analysis.

Analysis of Landsat ETM+ satellite indicates that oil and water show similar reflection between wave lengths 475-675 nm but differ in 675-800 nm [14]. These insights were achieved by releasing oil in a test tank with wave generator indicating that the results are highly reliable. Srivastava and Singh [15] showed that 480nm tend to be "contaminated" by biogenic material, algae and such, and therefore recommend band ratios to be normalized. For Sentinel-2 imagery B2 ( 458-523nm) cover this spectrum and could act normalizer. The study [5] with data from Landsat ETM+ showed good results by using ratios from these principles for oil spill detection. The corresponding ratios for the Sentinel-2 product would be:

(16)

Band ratio1 = B4/B2 Band ratio2 = B7/B2

Figure 8: A visible photo of a small test slick. The sun glint on the left confounds the interpretation of slick

edges. [16]

Even if measurements are expected to hold a sufficient level of reliability there are some circumstances that can create situations where oil slick differentiation is challenging. Figure 8 is a great example on how sun glint can make oil slick edges very hard to detect in the visible spectrum.

3.1.1 Oil slick segmentation and inference

There is a convention about thickness estimation of oil slicks on water , Bonn Agreement Oil Appearance Code (BAOAC) [17], based only on visible wave lengths (400 to 750 nm). Five different reflectance groups are identified based on reflectance

characteristics. These five groups have thickness estimations as seen in Figure 1 and might act as a basis for comparison with results of this work.

Table 1: Thickness estimation chart from the Bonn agreement oil appearance code (BAOAC) [17].

Code Description -

Appearance

Layer Thickness Interval (µm)

Litres per km² 1 Sheen (silvery/grey) 0.04 to 0.30 40-300

2 Rainbow 0.30 to 5.0 300-5000

3 Metallic 5.0 to 50 5000-50 000

4 Discontinuous True

Oil Color 50 to 200 50 000 to 200 000

5 Continuous True Oil

Color More than 200 More than 200 000

-15pt

(17)

3.2 Cluster tendency of data set 3 THEORY

3.2 Cluster tendency of data set

Cluster tendency assessment determines whether a given data set contains meaningful clusters. Measuring the spatial randomness in the data one can conclude whether the data is considered uniformly distributed. Comparing a random sample from the data set under investigation to a uniform data set with the same variance, the Hopkins statistic [18] can be calculated as:

H = 1 −

Pn i=1a_i Pn

i=1bi+Pn i=1ai

where bi is the distance between each point and its nearest neighbor in a random sample, with n points, from the given data set D. ai represents the distance from each point in a simulated data set(Drandom) and its closest neighbor in D. Drandom

represents a data set, with n points, from a random uniform distribution with identical variance as D.

The null and alternative hypothesis is stated as,

(H0 : The investigated data set is uniformly distributed.

H1 : The investigated data set is NOT uniformly distributed.

The statistics value ranges from 1 to 0. 0.5 is considered a completely random data set as the sum of the nearest neighbor distances for the actual data set equals those from the generated random data. The closer to 0, or 1, the stronger the indication that H0 should be rejected in favor of H1 which indicates that the investigated data contains meaningful clusters.

The Hopkins statistic is considered a good test on lower dimension data and for smaller cluster[19], as in this case.

3.3 Dissimilarity measures

The unsupervised methods all relate to the grouping and segmentation of data points into subsets, or "clusters", such that data points in the same cluster have a closer relationship than to data points in a different cluster [20]. The data points can be described by a set of variables or as a relation to other data points. Cluster analysis is performed to extract descriptive statistics from the data to ascertain if the data consists of distinctive subgroups that all have substantially different properties.

The degree of similarity within subgroups is central in the evaluation of clustering method performance. The data points are assigned to the group which it has

(18)

the highest similarity with, or "closest to". But it’s usually measurement in dissimilarity [20] between data points. We define a dissimilarity between values of the j^th attribute as dj(x_ij, x_i⁰_j)and then define:

D(x_i, x_i⁰) =

p

X

j=i

d_j(x_ij, x_i⁰_j)

as the dissimilarity between the objects i and i⁰ and their attributes j = 1, 2....p.

The dissimilarity can be measured in several ways but the most common is the squared Euclidean distance [20];

d(x_i, x_i⁰) = (x_i− x_i⁰)²

Other choice are of course possible but are more relevant for non quantitative attributes, eg. categorical data.

Combining p-individual attributes to a single overall measure of dissimilarity is nearly always done with weighted average, wj where P^p_j₁w_j = 1. Introducing this to equation 3.3 we get:

D(x_i, x_i⁰) =

p

X

j=i

w_j ∗ d_j(x_ij, x_i⁰_j)

But as highlighted in [20, p508] if the goal is to discover natural groupings some attributes which are more relevant in separating groups could be assign a influence in defining object dissimilarity meaning one will divert from average weights, w_j 6= 1/N, where N is the number of attributes in the model.

3.4 Clustering methods

3.4.1 K-Means clustering

K-means clustering methods is done by randomly place K "grouping" means in the data. Then iterate such over the following steps until convergence:

1. For a given cluster assignment C, the total cluster variance (sum of squared distances) is minimized with respect to the cluster means, m1,...,mK, yielding the means of the currently assigned clusters.

2. Given a current set of means m1,...,mK, cluster assignment of each observation is done by finding the cluster mean resulting in minimal distance. That is,

C(i) = argmin

1≤k≤K

||xi− mk||²

(19)

3.4 Clustering methods 3 THEORY

3. Steps 1 and 2 are iterated until the assignments do not change.

K-means method is a rather simple method that can serve as a good starting point for further investigation. Fuzzy c-means method calculates a membership value for each observation to each group mean, K. The partitioning of groups are thus soft which results in slightly different cluster assignment function. Observations that have similar group membership value for several groups can be left ungrouped or highlight it’s uncertainty.

The algorithm for clustering procedure is similar to the EM algorithm for estimating a certain Gaussian mixture model [20, p510]. The difference is the variance component, for a soft assignment σ² is use but as σ −→ 0 the probabilities turns binary which results in a K-means clustering.

To find the optimal number of clusters in K-means the standard practise is to repeat the process over a range of a set of clusters to then analyze their performances [20, p 512-513]. Finding an "elbow" in a plot of sum of squared errors by number of clusters is maybe the most popular method. The "elbow" appears since the cluster dissimilarity generally decrease with increasing K as naturally occurring clusters are separated. But as K exceeds the natural groupings the decrease in cluster dissimilarity tapers off substantially [20, p518-519].

3.4.2 Hierarchical clustering

Agglomerate, or bottom-up, hierarchical clustering is done by finding the smallest dissimilarity between groupings, which initially are single observations, and then group them together. The initial search for smallest dissimilarity is then repeated until all observations are grouped and all groups are connected, effectively grouped together. This process results in a dendogram as the example below:

(20)

Figure 9: Hierarchical clustering example [21]

The divisive, top-down, the split producing the largest between-group dissimilarity is chosen until single observation groupings are achieved. Both methods results N − 1, where N is the number of individual data points, levels in the hierarchy.

Each level represent a grouping of the data in a specific number of disjoint clusters.

There are three ways of defining dissimilarity between observations, single linkage, average linkage and complete linkage. Single linkage has the lowest threshold for grouping observations and complete linkage has the highest which are both quite extreme.

Single linkage (SL) takes the inter-group dissimilarity as the closest pair, often called nearest neighbor technique. Let G and H be two groupings in the data, then the dissimilarity, d, between the points i ∈ G and i⁰ ∈ H is defined as;

d_SL(G, H) = min

i∈G,i⁰∈Hd_ii⁰

Complete linkage(CL) maximizes for the furthest neighbor pair;

d_CL(G, H) = max

i∈G,i⁰∈Hd_ii⁰

The Average linkage (GA), or group average clustering, uses the average dissimilarity between groups;

(21)

3.4 Clustering methods 3 THEORY

d_GA(G, H) = 1 NGNH

X

i∈G

X

i⁰∈H

d_ii⁰

Average linkage is normally the default choice as it’s a combination of the other two other in an attempt to produce relatively compact clusters relatively well separated [20, p524-525]. As the sample size, N, approaches infinity GA becomes

Z Z

d(x, x⁰)p_G(x)p_H(x⁰)dxdx⁰ (1) But SL approaches zero and CL infinity as N −→ ∞ and it’s thus unclear what aspects of population density, pG(x)and pH(x⁰), those methods really estimates.

All agglomerate methods and some divisive have a monotonic property of the dissimilarity of merged clusters. It increases monotonously with the level of merger and the dendogram is often plotted such that the height of each nodes is proportional to it’s daughters’ dissimilarity as seen in above Figure 9. This very informative plotting property of hierarchical clustering is one large reason to the popularity of the method.

In contrast to K-means clustering this method does not required specification of numbers of cluster or starting configurations. This feature generate the same clustering solution every time. But the user must choose at which specific partitioning level "natural" grouping occurs, if there exists such a level [20].

Finally, computational complexity might limit this method as it has, at best, complexity of O(n²). This means that applying this method to large oil slicks might not be possible. In some case it’s even closer to O(n³) [22] but has the benefit of performing well on data that has a hierarchical structure. Additionally, connectivity matrices can be pass to a model to account for physical relation in images which is likely to be valuable. Another feature of the hierarchical clustering algorithm is that it requires O(n²) memory allocation during the computational process which could turn of to be challenging when handling large data sets.

3.4.3 Density-based clustering - Gaussian Mixture Model

Density-based clustering refers to unsupervised learning methods that identify distinctive groups/clusters in the data, based on the idea that a cluster in a data space is a contiguous region of high point density, separated from other such clusters by contiguous regions of low point density. The data points in the separating regions of low point density are can be considered noise/outliers [23]. The Gaussian mixture model (GMM) allows for different variances within clusters creating a more flexible

(22)

partitioning than a regular K-means approach.

The Gaussian Mixture Model has the following form,

f (x) =

M

X

m=1

α_mφ(x; µ_m,^P_m) with mixing proportions αm,P

mα_m = 1 and Gaussian density function φ with mean µm and covariance matrix^Pm. These parameters are usually fitted maximum likelihood estimation through the EM algorithm.

Given that the observed data is denoted Z, having log-likelihood L(θ; Z) and depending on parameters θ. The latent data is Zm making the complete data T = (Z+Z_m) with log-likelihood L0(θ; T), L0 based on the complete density.

Then the EM algorithm is as follows:

1. Start with initial guesses for parameters ˆθ⁽⁰⁾ 2. Expectation step: at the j th step, compute:

Q(θ⁰, ˆθ^j) = E(L0(θ⁰; T|Z, ˆθ^j))) as a function of the dummy argument θ⁰

3. Maximization Step: determine the new estimate ˆθ^j+1 as the maximizer of Q(θ⁰, ˆθ^j) over θ⁰

4. Iterate step 2 and 3 until convergence.

The covariance structure can be adjusted to reduce computational cost. Most common alternatives are spherical, tied, diagonal or individual (full) covariance matrices. Spherical means that each component in the model has its own single variance, tied - all components share the same general covariance matrix, diagonal - each component has its own diagonal covariance matrix and full allows each

component has its own general covariance matrix.

The initial guess for parameters ˆθ⁰ can be chosen at random or generated by output from a K-means model (often capped at some number of iterations) to decrease time til convergence.

(23)

3.5 Cluster Validity Indices (CVI) 3 THEORY

3.5 Cluster Validity Indices (CVI)

There are two widely accepted criteria for partitioning a dataset used in most CVIs:

compactness and separability. Compactness represents the closeness of elements within the same cluster and separability measures how different, or distant, each cluster are from one another. Clustering solution for a given data set attempts to create clusters where elements within each cluster are as similar as possible, but at the same time as different as possible from elements of a different cluster. The most common CVIs for hard clustering, where partial cluster membership aren’t allowed, are the Calinski and Harabasz index (CH), Davies and Bouldin index (DB) and the Silhouette Coefficient (Sil) [24].

Other methods can be applied for models where clustering are done through maximization of log-likelihood. Here, the well-known Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) will be used.

There are many approaches to decide on the optimal number to cluster a specific data set. In this study, the correct partition of the data is not known, nor are the results possible confirm in detail, which implies that internal cluster validation approaches has to be made with caution. Olatz Arbelaitz et al [24] published in the 46th Pattern recognition journal an extensive comparison of several scores and indices. The three best were, in order, Silhouette index - DB and CH. But as a reminder, these are methods to help in the analysis work - not a final answer. When there is cluster overlaps in the data, the success rate of the best index were merely 30%. Other research has shown that there are significant differences between almost all comparison indexes (tested on k-means algorithm). Therefore it’s wise to base decisions on more than one method of determining the optimal number of clusters [25].

More indications of how important field knowledge is in the process of setting model parameters are recent studies examining tumour heterogeneity using GMM models have proposed using visual inspection of the data or using the knee of the log-likelihood function as criteria for selecting the number of components [26].

These methods mentioned are good but not deterministic, thus more than one set of model parameters should be investigate.

There is no single right answer - any solution that exposes some interesting aspects of the data should be considered. [27]

(24)

3.5.1 Silhouette analysis

A good method to analyze the cluster groupings is the silhouette analysis [28].

It compares the within cluster dissimilarities with the next closest cluster and measures how well an object matches the ascribed clustering .

For each observation a silhouette score, s(i), is defined as:

Take the object, i, from the data set and let A denote the group it was assigned by the chosen clustering method.

a(i)= average dissimilarity of i to all other objects in A.

This is the average dissimilarity within the object’s cluster. Now let us consider any other cluster C which is different from A and compute:

d(i, C) = average dissimilarity of i to all other objects of C.

After computing d for all other cluster choose the least dissimilar, as:

b(i)= min

C6=A(d(i, C))

Then the silhouette score for the given object, i is computed as:

s(i) = max (a(i),b(i))^b(i)−a(i)

We note that −1 ≤ s(i) ≤ 1.

This calculation assumes a dissimilarity measure on a ratio scale, such that a value of 4 is considered twice as large as 2 for example. This is a characteristic that, for example, the Euclidean distance has. When s(i) is at it’s largest we can note that the within dissimilarity, a(i) is significantly smaller than the dissimilarity with the closest cluster b(i). We can then consider i "well-clustered". The other extreme is when s(i) is close to zero in which case the dissimilarity within and the closest different cluster is close to equal. It such a situation it’s unclear to which cluster the given object belong to. In order to get a good overview all silhouette scores are plotted by cluster belonging, and in sorted order. Average silhouette score for clusters is also a frequently used measurement where one might aim for the clustering generating the highest overall silhouette score. Silhouette plots for all objects can also reveal the characteristics of a cluster. For example all objects in the clusters can have very high scores and thus indicating a well clustered group

(25)

3.5 Cluster Validity Indices (CVI) 3 THEORY

while another cluster can have a majority of high scores but also some very low ones.

Those low scores can then be further investigated for being a separate cluster or just objects that doesn’t fit very well in any cluster. The heuristic reasoning implies that the silhouettes should look best for a "natural" value of k. The drawback of this method is its computational cost, which is O(n²).

3.5.2 Squared error analysis

Another common method is analyzing sum of squared errors for different number of clusters, k, in the model. The sum will naturally always decrease with increasing k and will drastically decrease for low k while for large k it will barely do so as fewer objects will change cluster for an increase in k. So what one is looking for is an

"elbow" in the curve for the sum which is considered to be the "natural" numbers of clusters.

In "An Introduction to Statistical Learning" [27] the complexity of model choice, and choice of optimal numbers of clusters, were covered. They highlight that decisions in regarding choices will have a strong impact on the results obtained. In practice, several different choices should be analyzed to look for the one with the most useful, or interpretable, solution.

3.5.3 Davies Bouldin Index

The Davies-Bouldin (DB)[29] index is an internal evaluation method for cluster quality validation. It compares the inter-cluster distance with the intra-cluster distance through the following formula:

DB = 1 K

K

X

i=1

maxi6=j

∆(Xi) + ∆(Xj) δ(X_i, X_j)

where δ(Xi, X_j) is the inter-cluster distance between cluster Xi and Xj, where the distance measure is the Minkowski distance. While ∆(Xi) is the intra-cluster distance of cluster i. The index results in a system-wide similarity measure of each cluster compared with its most similar neighbor. Davies-Bouldin also argues that both a global an local minimums of the index score should be investigated.

3.5.4 Calinski-Harabasz Score

It is a ratio-type index where the cohesion is estimated based on the distances from the points in a cluster to its centroid. The separation is based on the distance from the centroids to the global centroid. It can be defined as:

(26)

CH = N − KP

ck∈C|c_k|d_e( ¯c_k, ¯X) K − 1P

ck∈C

P

Xi∈ckd_e(X_i, ¯c_k)

where ¯c_k denotes the centroid vector of a cluster, ¯X is the mean vector of the whole data set, de is the Euclidean distance between the two objects, N is the number of data points and K is the number of clusters [30].

3.5.5 AIC & BIC

AIC and BIC can be applied in cases where clustering is done by maximization of log-likelihood, like the case for a Gaussian mixture model.

The log-likelihood is AIC is defined as:

AIC = 2k − 2 ln(ˆL)

where k is the number of estimated parameters in the model and ˆL is the maximum value of the likelihood function for the model.

Whereas BIC is defined as:

BIC = k ln(n) − 2 ln(ˆL) where k and ˆL are identical with AIC definition.

The difference as one can notice is the penalty term which is considerably larger in BIC. But in the case where ˆL ln(n), the value is dictated by the negative term - thus being similar. The optimal model is selected based on low scores since one want to maximize the likelihood.

3.6 Image manipulation

This section is dedicated to image manipulation techniques. They are essential to the analysis but not for the procedure of clustering analysis itself. Additionally, it is somewhat out of scope for this thesis and thus will not be covered in depth.

3.6.1 Fourier Transformation

In this particular case we have two dimensions over which signals vary; space and frequency - for every channel. The transformation is done for every row and column independently, effectively replacing time with space in Figure 10. Fast Fourier Transformation (FFT) is an algorithm that generalizes transformation calculations to reduce the complexity from O(N²) to O(N ∗ log2N). The result

(27)

3.6 Image manipulation 3 THEORY

of this transformation can be used to uncover frequencies that are periodic, like waves. These frequencies can then be excluded, or suppressed, from an image by multiplying the transformed image matrix with a suppression matrix. This effectively excludes the features in an image that has certain periodic appearance, such as waves.

Figure 10: Visualization of Fourier decomposition works. [31]

3.6.2 Low pass Gaussian blur

The Gaussian filtering technique is also an image manipulation method but made to reduce general noice and detail. The method averages pixel values based on neighboring pixels, in a given kernel size, where kernel coefficients are assigned by a 2D Gaussian, thus smoothing the differences in such kernel. The convolution of the kernel over the image is called a low-pass filter [32].

(28)

4 Methodology

This chapter will detail how the methodology in this work was implemented. First, an introduction to the data structure and retrieval will be given. Further how this data is being processed before entering the modelling phase where clustering setups and performance assessments are laid out. Critical design choices for clustering algorithms, validation and interpretation is stated throughout this chapter.

4.1 Data retrieval

Sentinel-2 images were downloaded from Copernicus Open Access Hub [33], for free, using their provided API [34].

Large oil findings by in-house operations on SAR data, was suggested for investigation for this thesis. To find Sentinel-2 products containing the same oil spill, the image covering the area with the closest sensing time was chosen, given that there was one covering the spill area within 24h of the SAR detection. If no Sentinel-2 image satisfied this criteria the detection were excluded. Imagery with substantial cloud coverage were excluded due to obvious challenges in optical analysis. Finally, the images were searched manually to roughly locate the oil slick. Due to limited time the first five products (P1-P5) will act as basis for cluster modelling.

They have the following details:

Confidential Information

4.2 Data Preprocessing

Since the data consists of different resolutions - resampling low resolution bands to the highest resolution (10m) was decided for. The resampling was done with bi-linear calculations through the GDAL library in Python. This method highlights

(29)

4.2 Data Preprocessing 4 METHODOLOGY

edges more clearly compared to the default cubic resampling method, but is

"smoother" than a nearest neighbor method - the "mid way" method of the three.

As there are, at most, 12 variables for each pixel dimension reduction is not really necessary. We do not have to consider missing data points either since the L1C product incorporates calculations to guarantee completeness. All data is integers as pixels are given a value from 0 to 10000.

To get a general feel of the reflectance characteristics of different features in the image small sections of distinct features in the image were extracted and reflectance means were calculated. This to get to know the data better before making decisions that require large implementation efforts.

4.2.1 Band selection and rationing

By comparing the oil slick reflectance in Figure 7 with the L1C measurements we can exclude some bands where we expect no relevant information about about the oil slick. Such bands are B1, B5, B6, B7, B8a, B9, B10. The information in those bands are either irrelevant to oil slicks or are mostly covered in other bands. If they are excluded, we are left with B2, B3, B4, B8, B11 and B12.

Based on work of Taravat and Del Fraite [5], two band ratio calculations done according to the results in that paper. But with indications from 3.1 another ratio was added to potentially catch the second spike for oil (∼ 1600nm − 1750nm). Not a perfect fit for the Sentinel-2 band 11 but it could provide valuable information.

By adding this band ratio to the earlier mentioned, we get the following suggestion for ratio calculations.

Band ratio1 = B4/B2 Band ratio2 = B7/B2 Band ratio3 = B11/B2 4.2.2 Scaling and further manipulations

The variation,range and mean of different bands are somewhat similar and thus, in their raw format, not have equal contribution in the models. Standardizing the data would prevent this and might improve the results - thus a suggested approach for most models. But performing a standardization on raw data are risky as significant portions of the image could be covered by clouds not excluded by the cloud mask. Inclusions of such pixels would skew the standardization heavily as cloud pixels can have extremely high reflective values in all bands. Since this is a

(30)

smaller project, a manual process of marking a portion of normal water, and taking their mean, acts a basis for normalization upon all pixels are divided. The process is - create a polygon covering normal water, calculate mean values and then divide all pixels in the image by this. This will ensure that normal water have values around one, areas of interest will diverge from that and inter-product comparison is made possible.

Figure 11: Example of an image where a gradient of lightness creates a

problem for clustering. The brightest parts (yellow) are clouds.

As Optical imagery is dependant on reflecting sun rays, the angle at which the rays hit the surface can create problems.

First, the angle to the sun can create a gradient of lightness such that one part of the image is slightly brighter than the opposite one (see Figure 11). Similar situation can also be created by a very thin cloud layer, fully possible to see through but still enough to affect the clustering possibilities. To account for this, a linear scaling in both dimensions are made to equalize potential differences by dimming the brighter parts of the image. The edges are compared but the brightest 30% of the pixels are excluded such that clouds can appear in the image but not affect this scaling procedure. The 30% is

arbitrary but shows most robust results during testing.

Figure 12: Example of an image (band 2 plotted) where waves are prevalent and clearly visible which causes issues for good and coherent

clustering.

Secondly, waves in the image will create sun glint phenomena resulting in a significantly stronger reflection at the top of a wave and subsequently, weaker between wave tops (see Figure 12). Since the resolution is 10m, these effects are created by large subsurface waves, sometimes called swells, not the surface waves one see at a beach. These waves are somewhat constant and linear in fashion.

This means that they appear with a certain frequency, amplitude and direction in the image. To clear an image from such "noise"

one approach is to use a FFT (section 3.6.1) and then exclude periodic frequencies from

(31)

4.3 Oil spill area pixel selection 4 METHODOLOGY

the image. Performing this process will result in some information loss but it improves interpretability of clustering results through enhanced uniformity in affected areas.

Since the waves aren’t perfectly periodic or completely linear, a circular Gaussian filter matrix were generated according to the FFT frequency plot and then multiplied with the image FFT itself. The size of the filter is estimated through inspection of the plot and was calibrated until satisfying exclusion of frequencies are achieved.

This was repeated if the region considered wave noise in the FFT frequency plot is not covered good enough by one circular Gaussian filter.

Finally, to further reduce a very grainy result a weak Gaussian low pass blur was applied to the image. Oil slicks aren’t expected to be grainy in the 10m resolution, and additionally, such level of detail is not necessary. This, too, is not to be overdone as valuable detail information can be lost. The standard deviation (σ = 1) for the Gaussian kernel was set to 1. This process will dim the "outliers" (high and low values) in a kernel and thus reduce the risk of a very grainy clustering result. An easily interpretable result is preferred over exact pixel-to-pixel accuracy.

4.3 Oil spill area pixel selection

As mentioned - products are 10980 × 10980 pixels which would overwhelm the computer capacity at hand if analyzed as a whole. The oil slicks are normally significantly smaller, where the biggest,P1, measures ∼123km² which translates to 1 230 000 pixels with several dimensions. Formed as a perfect square it would be of size 1110 pixels. The computational limitations sets the maximum image to feasibly handle to ∼1500x1500 pixels. The slicks obviously weren’t perfectly squared which meant that only a portion of the biggest slicks was possible to analyze at once. To further reduce computational load, a polygon was drawn in images to extract a selection of pixels for further analysis and modelling. Indices for extracted pixels were kept such that outputs from cluster models could later be mapped back onto the original image. This were done for all images and these pixel values were stored to form the data set upon which a large model could be fitted to.

This extraction ended up resulting in ∼60 000 − 1 150 000 entries, with 3 dimensions, for individual products and ∼1 900 000 across all five products.

4.4 Scan lines

The satellite have several sensors to cover the entire it passes over. They aren’t always perfectly calibrated and "scan lines" with slightly different reflectance can

(32)

appear in images(see Figure 13). There are methods that attempt to account for this. Though, this is not a major issue for showing proof of concept for the aim of this study. Considering the time required to implement a solution - such methods are left for future studies. In images where the scan line passes through the oil slick of interest, the largest section which doesn’t cross the scan line, is selected.

4.5 Hopkins cluster tendency implementation

Figure 13: Showcase of a scan line, shown in band 2 of P4, located in the right side of the image travelling from X ≈ 650 to X ≈ 850 and across the

entire range of Y values.

Since the data sets are very large it won’t be feasible to calculate the score over the data set in entirety. To reduce the acute computational load the following procedure will be used.

1. Select 1000 elements at random.

2. Calculate the Hopkins statistic.

3. Repeat step 1-2 for a total of 10 times.

4. Take the average of the scores.

4.6 Model implementations

A method for deciding the appropriate number of clusters for a model was designed to reduce time complexity. 10000 pixels were

selected at random which formed the temporary data set on which models were fitted and mentioned performance measurements (ref Section 3.5) were calculated.

This was repeated 3 times and averages were calculated. Optimal model parameters were decided by weighing all measurements together - looking for consensus and agreements. These optimal parameters were then used for fitting the model for clustering the entire image.

4.7 K-means clustering implementation

The data was fed into a K-means clustering algorithm (sklearn - K-means) with increasing values of n to generate a basis for number of clusters selection. In deciding the right number of clusters, several methods were used to analyze the fit; sum of squared errors, silhouette analysis, Davies Bouldin index and Calinski-Harabasz score. Looking at their joint indications, choice of optimal n was done for each

(33)

4.8 Hierarchical clustering implementation 4 METHODOLOGY

model.

When model parameters were decided for the model was fitted to the entire analyzed data set and pixels were assigned labels which then were mapped back to the image for plotting.

This process was done to both individual images as well as the combined data.

4.8 Hierarchical clustering implementation

As for the K-means method - the data was fed in to the hierarchical clustering algorithm. But with this algorithm computational limitations were encountered.

Very large allocations of memory ((m ∗ (m − 1))//2) is needed for the tree build, specifically for the pairwise distance matrix. Attempting to calculate it for 100000 elements require as much as 37.3GB. Reducing the data set by randomly selecting elements is a possibility but it would mean that the fit is performed on an very small subset of the original data set. Fitting on less than 1% is not feasible, thus rendering this approach impossible with the computer power at hand.

Just for interest, connectivity parameters were fed into clustering of a small section of an image, as a test of spatial information contribution. For this agglomerative clustering was chosen and n decided for through inspection of the dendrogram.

4.9 GMM clustering implementation

The GMM clustering algorithm used was sklearn’s Gaussian Mixture. Even if the theory suggest letting all clusters have their own covariance structure, all covariance types were run for all n - the types are; spherical, diagonal, tied and full. To select the appropriate number of clusters, BIC and AIC scores were calculated for each choice of n, in addition to CH and DB. In deciding n BIC was emphasised. As for K-means, this was done over thee repetitions of a randomly selected subset (10000 data points) of the entire image.

When the best performing model parameters were found these were used to fit a GMM model to assign labels to pixels for both individual images as well as the combined data. As the differences in the images are very subtle, there’s rarely a clear cut answer. Therefore more than one value of n was run and plotted to chosen the most optimal value.

(34)

4.10 Model selection and inference

Choosing the number of clusters in the different models is not easy as the aim is not only to find oil but more importantly - map relative thickness in the oil slick. It’s most likely a continuum from thick- to thinner oil and thinner oil to water, which could make it hard to see the correct answer through CVI’s, as they tend to favor distinct separation between clusters. In the attempts to find relative thickness of the oil slicks we will shy away from choosing n = 2 too easily as it would only act oil detection. Local maximums are of interest, maybe relevant for the Calinski-Harbasz score, and also points of maximum curvature in the CVI plots.

Then, the cluster labels were plotted as a layer on top of the original RGB image for comparison. To decide which model provides the most informative results, an in-house expert on oil slick analysis were consulted. For each product, polygons for the total oil extent and thicker region (if one existed) was manually made to act reference in analysis.

When the best model was determined the analysis of the identified clusters took place in order to draw inference on oil slick characteristics tied to each cluster.

Additionally, the corresponding SAR detection report (from KSAT) was compared with. The shape file that the detection procedure produces was plotted onto the RGB image. As earlier mentioned, the SAR and Sentinel-2 image captures can be separated by several hours and should therefore be compared with considerations to weather conditions and sea currents in mind. This was done to investigate the similarities between the two methods and draw conclusions about the information contribution of Sentinel-2 oil slick analysis.

4.11 General application and robustness

To examine the possibility of a fitting model to the entire data set to then predict new, images were investigated. The amount of data is rather large in pixels but the number of individual images are very limited. Thus, this is more to test the concept such that further attempt can better estimate data collection needs.

The is valuable approach since, if satisfyingly successful, it can instantly provide predictions for new, unseen data. Though, problems like this is normally better solved by applications of machine learning techniques.

The same analysis procedure,as described above, took place for the general model for its prediction on the images. The general model results was compared with the same model fitted to the individual image.

(35)

4.12 Software usage 4 METHODOLOGY

Finally, the most satisfying results from the models were compared with the SAR detection report to analyze what value this method could possibly add.

4.12 Software usage

In this work Python 3.6 have been used exclusively in all step of the process.

packages worth mentioning that have been used are: sklearn (for cluster algorithms, silhouette calculations and scaling), and in-house code library for data handling.

(36)

5 Results

The products on which the models are fitted are denoted as P1-5, as we recall from Section 4.1. P-comb denotes the combined data across all products. This section will cover the major results from data handling and analysis of model results to describe key findings.

5.1 Image manipulation with FFT

This section will showcase the results of performing the FFT manipulation method mentioned in Section 4.2.2 to address problems with waves.

Figure 14: The left image is raw data and the right is after FFT manipulation and a slight Gaussian filter blur. Band 2 is chosen for representation.

Figure 14 clearly shows that the image after FFT and Gaussian blur contain drastically less waves. In the following image running K-means clustering algorithm on the above images are shown to further clarify the problematic results.

(37)

5.2 Feature reflectance characteristics 5 RESULTS

Figure 15: FFT clustering comp

Performing these image manipulation techniques enables a more coherent and easily interpreted result as seen in Figure 15. Some detail is sacrificed for interpretability.

5.2 Feature reflectance characteristics

The feature reflectance characteristics extraction was fairly similar across product, below are the results for the two most different products.

(38)

Figure 16: Feature reflectance characteristics for P1. Boat strong represents a large vessel (ship/tanker) with strong general reflectance. Boat medium represents a smaller

vessel with more subtle characteristics.

(39)

5.3 Data pre-processing 5 RESULTS

Figure 17: Feature reflectance characteristics for P3. Boat strong represents a large vessel (ship/tanker) with strong general reflectance. Boat medium represents a smaller

vessel with more subtle characteristics.

First, many features have similar shapes in the plots, only different magnitude, at least water and oil are very similar in P1 (Figure 16). Additionally, oil can have both lower and higher general reflectance than water, as seen in Figure 17. This obviously creates some challenges. It appears that products have similarities but also differ quite drastically in some regards.

5.3 Data pre-processing

5.3.1 Cluster tendency

Running the hopkins cluster tendency test on one average product are seen in Table 2. As the test shows such clear results, only one product was

Table 2: Hopkins statistic for one, average product, P1.

Data raw data oil ratios water mean ratios PCA on raw data

P1 = 0.020769 0.008827 0.009197 0.011853

(40)

5.3.2 Preliminary feature extraction comparison

There are many ways one can process the data before fitting a model to it. As mentioned, Taravat and Del Fraite proposed some band ratios to highlight oil slick on the sea surface. Through some pilot testing with a GMM model with 4 clusters several approaches were compared to find a better pre-proccessing methods of the data.

(a) Clustering results based on oil ratio calculations from Taravat.

(b) Clustering results based on water ratios.

Figure 18: Assigned labels through a GMM model, "full" - n = 4, are plotted over an arbitrary chosen oil slick for comparison purposes.

As we see in Figure 18 the oil ratio calculations produce very grainy results, and also very limited information apart from the thick oil patch. The section just below the center of the image (≈ 800, 1050) contains a thin fragmented layer of oil and should not sporadically be assigned the second highest label value as it is in Figure 18a. The ratio calculations also makes it hard to arrange the labels in correct order since water seems to position itself between the values of thick and thin oil in all ratios. The water normalization ratio clearly performs better (see Figure 18b) and was thus used for rest of the analysis.

5.4 Hierarchical clustering

The high complexity of the hierarchical clustering algorithm made it unfeasible to cluster the selected oil slicks in whole. A very small clip of the image (200×200 pixels) was tested with agglomerative clustering, euclidean distance and different linkages. The results were similar to K-means.

(41)

5.5 K-means and Gaussian Mixture Model 5 RESULTS

The interest of examining how spatial information influences the clustering induced a small test. The image was 200x200 pixels to, at least, superficially test the method. These were the results:

(a) Clustering results by hierarchical clustering with spatial information.

(b) Input data for the hierarchical model for comparison. Band 1.

Figure 19: Hierarchical clustering results with connectivity constraints compared with one band (Band 1) of the input data. Label color are not ordered. Number of neighbors =

8 in connectivity graph, linkage set to ward.

Figure 19a shows clusters when the hierarchical tree is cut at n = 3, a choice where the height of the nodes were relatively large. It’s very hard to define optimal number of clusters in this case but this is done in accordance with Section 3.4.2 and gives an introduction of the possibilities with the method. The clusters represent areas with stronger reflectance values which in this small example is oil. But perhaps the brightest pixels, in the far right of Figure 19b, forms only a small area and is disconnected from the rest which is why it’s not in the three clusters shown.

5.5 K-means and Gaussian Mixture Model

In this section results for both the K-means models and Gaussian Mixture models will be presented - product by product to simplify comparison. The results include CVI scores and clustering results of the models with optimal parameter values.

Performance statistics for models with n ranging from 2 to 10 are generated for each product. The clustering results are mapped to the original image and arranged by reflectance in Band 11 as knowledge gained from feature reflectance characteristics

(42)

in products (Figure 16, 17) and theory (Section 3.1) indicates it may hold most information about thickness. The labels are assigned coloring according to the

’jet’ color palette of matplotlib. Label 0 is the part of the product that is not analyzed - assumed water. Additionally, the indications from an expert are plotted as overlay on the image, black line for the thickest regions and grey for the thinner ones.

5.5.1 Product 1

Performance statistics for models with n ranging from 2 to 10.

Figure 20: CVI for K-means model for P1. Number of clusters, n, in the model is shown on the X-axis with the corresponding CVI score on the Y-axis.

Figure 20 shows strongest indication may be for two natural clusters as both DB

(43)

and silhouette score heavily favors such a choice. But with local maximum, at n = 4, close to the global maximum for CH. Sum of squared errors have a strong curvature at n = 2 and n = 4. With the bias to choose n > 2, n = 4 is chosen.

Figure 21: Clustering results for K-means model with n = 4 for P1. Top left image is the raw RBG image. Top right is the first band (Band 2) after FFT and Gaussian filter blur. Bottom left is the clustering results from the model - The label number represents the relative oil thickness. Dark blue pixels are section of the image not analyzed for oil.

Bottom right is the RGB image with clustering results as overlay, alpha = 0, 5, with the cluster most likely to be water excluded

To analyze the clustering we can look at the bottom-left image in Figure 21 we can see that the thicker patches, indicated with black solid line, are captured well by the model. It’s either captured solely by the highest label or together with the second highest. The total extent of the slick, grey solid line, coincides less well with corresponding label in green. Label 1 is considered water.

(44)

Figure 22: CVI for GMM model for P1. Number of clusters, n, in the model is shown on the X-axis with the corresponding CVI score on the Y-axis. The small star in

BIC/AIC plot indicates the minimum score.

BIC and AIC flat lines rapidly which makes differences between numbers of clusters quite small. BIC indicates n = 4 as best, CH and DB favors 4-5 and for AIC, which penalises additional clusters less, n = 8 scores best (see Figure 22). In aggregate, n = 4 is chosen.

(45)

Figure 23: Clustering results for GMM model with n = 4 for P1. Top left image is the raw RBG image. Top right is the first band (Band 2) after FFT and Gaussian filter blur. Bottom left is the clustering results from the model - The label number represents

the relative oil thickness. Dark blue pixels are section of the image not analyzed for oil.

Bottom right is the RGB image with clustering results as overlay, alpha = 0, 5, with the cluster most likely to be water excluded

Again, looking at the the bottom-left image in Figure 23 we can see that the thicker patches, indicated with black solid line, are captured well by the model. It’s either captured solely by the highest label or together with the second highest. The total extent of the slick, grey solid line, coincides less well with corresponding label in green. Label 1 is considered water.

5.5.2 Product 2

Performance statistics for models with n ranging from 2 to 10. Different for this product is that it contains only very thin, or highly emulsified, oil. The label sorting is therefor reversed to simplify interpretability - resulting in highest label for lowest value in Band 11.

(46)

Figure 24: CVI for K-means model for P2. Number of clusters, n, in the model is shown on the X-axis with the corresponding CVI score on the Y-axis.

In Figure 24 sum of squared errors and CH favor n = 3 while the other favor the lowest possible n. Since we would like not to create a detection discrimination only, n = 3 is chosen in slight favor.

Feature extraction and cluster analysis of oil slicks using optical satellite data (Sentinel-2)