Investigation and forecasting drift component of a gas sensor

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Statistics and Machine Learning

2021 | LIU-IDA/STAT-A--21/004—SE

Investigation and forecasting drift

component of a gas sensor

Farhana Chowdhury Tondra

Supervisor : Annika Tillander Examiner : Jolanta Pielaszkiewicz

External supervisor : Anita Lloyd Spetz Jan Ybrahim

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Chemical sensor based systems that are used for detection, identification, or quantifi-cation of various gases are very complex in nature. Sensor response data collected as a multivariate time series signals encounters gradual change of the sensor characteristics (known as sensor drift) due to several reasons. In this thesis, drift component of a Silicon Carbide Field-Effect Transistor (SiC-FET) sensor data was analyzed using time series. The data was collected from an experiment measuring output response of the sensor with respect to gases emitted by certain experimental object at different temperatures. Aug-mented Dickey Fuller Test (ADF) was carried out to analyze the sensor drift which revealed that stochastic trend along with deterministic trend characterized the drift components of the sensor. The drift started to rise in daily measurements which contributed to the total drift.

Traditional Autoregressive Integrated Moving Average (ARIMA) and deep learning-based Long Short-Term Memory (LSTM) algorithm were carried out to forecast the sensor drift in reduced set of data. However, reduction of the data size degraded the forecasting accuracy and imposed loss of information. Therefore, careful selection of data using only one temperature from the temperature cycle was chosen instead of all time points. This chosen data from sensor array outperformed forecasting of sensor drift than reduced data set using both traditional and deep learning methods.

Keywords: Time series analysis, Forecasting, ARIMA, LSTM, Sensor, SiC-FET, ADF test, Stochastic Trend, Deterministic Trend, Sensor Drift.

(4)

Acknowledgments

First and foremost, praises and thanks to the Almighty creator, for His showers of blessings to complete this thesis successfully.

I would like to express my deep sense of gratitude and thanks to my supervisor, Annika Tillander of the Division of Statistics and Machine Learning (STIMA), Department of Computer and Information Science (IDA), for her exceptional guidance, supervision and encouragement during the thesis work. I would also like to thank my external supervisor Anita Lloyd Spetz of the Department of Physics, Chemistry and Biology (IFM) for her deep insights on sensor and experiment. Without continuous guidance of Jan Ybrahim, Develop-ment Engineer, DANSiC AB, I would not be able to complete this thesis. I am grateful to all supervisors of STIMA for their valuable comments and constructive feedback during this journey.

I would also like to thank my families for the unconditional love, endless support and encouragement during the study period. Last but not the least, I would like to thank Sakib for providing strength during this time.

(5)

Glossaries 1 1 Introduction 3 1.1 Motivation . . . 3 1.2 Aim . . . 4 1.3 Research questions . . . 5 1.4 Related Work . . . 5 1.5 Delimitations . . . 5 2 Sensor 6 2.1 Sensor . . . 6 2.2 Experiment Details . . . 6 2.3 Gas Details . . . 8 3 Theory 9 3.1 Time Series . . . 9 3.2 Data Aggregation . . . 11 3.3 Forecasting . . . 11

3.4 Test Methods For Stationarity Of Time Series . . . 14

3.5 Performance Measures . . . 17 4 Method 19 4.1 Data . . . 19 4.2 Software Environment . . . 20 4.3 Data Analysis . . . 21 4.4 Data Preprocessing . . . 22 4.5 Forecasting . . . 23 5 Results 28 5.1 Data Analysis . . . 28 5.2 ARIMA Forecasting . . . 30 5.3 LSTM Forecasting . . . 34 6 Discussion 38

(6)

6.1 Results . . . 38 6.2 Future Work . . . 39

7 Conclusion 40

(7)

List of Figures

1.1 Classification of Drift . . . 4

2.1 SiC-FET sensor . . . 6

2.2 Measurement Setup of the experiment, flow of carrier gas (synthetic air) and object gas through the valves to the sensor. . . 7

3.1 A RNN hidden cell . . . 13

3.2 An LSTM cell containing the basic features . . . 14

4.1 Data Overview. In (a) the sensor signal provides data stored in rows with the columns representing one temperature cycle and in (b) one row or cycle plot of each group of gas types that were collected in 70 seconds. . . 19

4.2 Time series data (a) outliers observed in 12th August, (b) removed outliers and imputed the removed time series data. . . 22

4.3 Non-stationary Time Series . . . 24

4.4 Stationary Time Series . . . 24

4.5 Red circles pointing to the plateaus of a cycle plot . . . 25

5.1 ARIMA Forecasting (2,1,0)(0,1,0)[168] (Hourly Data) . . . 31

5.2 Forecasting Residuals (Hourly Data), (a) histogram and ACF of train set residuals, (b) histogram and ACF of test set residuals. . . 31

5.3 ARIMA(4, 1, 1)Forecasting (Minutes Data) . . . 32

5.4 ARIMA Forecasting at two different temperature, (a) ARIMA(5, 1, 1)model for the response data at 213°C, (b) ARIMA(5, 1, 3)model for the response data at 389°C temperature. . . 33

5.5 LSTM Forecasting (Hourly Data) (a) The loss function MSE & MAE is decreased while training the model as epochs increased, (b) fitted LSTM model using hourly aggregated data. . . 34

5.6 LSTM Forecasting (Minutes Data) (a) The loss function MSE & MAE is decreased while training the model as epochs increased, (b) fitted LSTM model using min-utes aggregated data. . . 35

5.7 LSTM Forecasting (Temperature Selected) (a) LSTM model for the data at 213°C, (b) LSTM model for the data at 389°C temperature. . . 37

5.8 Forecasting measures using different methods, (a) a bar plot representing RMSEs in all implemented ARIMA models, (b) a bar plot representing RMSEs in all im-plemented LSTM models. . . 37

(8)

List of Tables

2.1 Temperature Cycle . . . 8

4.1 Sliding Window . . . 26

5.1 Deterministic and Stochastic Trend in a Cycle . . . 28

5.2 Summary of cycles . . . 29

5.3 Deterministic and Stochastic Trend in One Day . . . 29

5.4 Deterministic and Stochastic Trend in Entire Data . . . 30

5.5 ARIMA Forecasting (Hourly Data) . . . 32

5.6 ARIMA Forecasting (Minutes Data) . . . 32

5.7 ARIMA Forecasting (Temperature Selected) . . . 33

5.8 LSTM Forecasting (Hourly Data) . . . 35

5.9 LSTM Forecasting (Minutes Data) . . . 36

(9)

Glossaries

ACF Autocorrelation Function. ADF Augmented Dickey-Fuller. AIC Akaike Information Criterion. AR Autoregressive.

ARIMA Autoregressive Integrated Moving Average. ARMA Autoregressive Moving Average.

CNN Convolutional Neural Network. CPU Central Processing Unit.

CSV Comma-Separated Values. DF Dickey-Fuller.

FET Field-Effect Transistor. GPU Graphics Processing Unit.

KPSS Kwiatkowski–Phillips–Schmidt–Shin. LSTM Long Short-Term Memory.

MA Moving Average.

MAE Mean Absolute Error.

MAPE Mean Absolute Percentage Error. MASE Mean Absolute Scaled Error. MFC Mass Flow Controller.

MLE Maximum Likelihood Estimation. MLP Multilayer Perceptrons.

MOSFET Metal Oxide Semiconductor Field Effect Tran-sistor.

MOX Metal Oxide Semiconductor. NNAR Neural Network Autoregression. PACF Partial Autocorrelation.

PSR Phase Space Reconstruction. RBF Radial Basis Function. RMSE Root Mean Square Error. RNN Recurrent Neural Network.

SARIMA Seasonal Autoregressive Integrated Moving-Average.

(10)

Glossaries

tanh Hyperbolic Tangent Activation Function. tidyverse Opinionated Collection of R Packages

De-signed for Data Science. VAR Vector Autoregression. VOC Volatile Organic Compounds.

(11)

1 Introduction

1.1 Motivation

Forecasting is a common statistical task that plays a vital role in science and economics. Predicting the future as accurately as possible conditioned on availability of information, including historical data and knowledge of future events is known as forecasting. Good forecasting reduces the cost of decision-making by predicting unknown situation. Common examples of forecasting are weather forecasting, stock market forecasting and supply chain management. Another example of statistical forecasting is analyzing and forecasting sensor data to predict sensor’s life time or sensor’s gradual change in parameters while utilization. A sensor is a device that transforms physical input from its environment and convert it into a human or machine readable form. They can be classified in different ways; one relevant classification is based on the input output relation of the sensor, namely Photoelectric, Ther-mometric, Electrochemical. Temperature, pressure, acceleration are measured by what is called physical sensors whereas photoelectric, thermoelectric and electrochemical sensors are some examples of chemical sensors. A chemical sensor responds to the Volatile Organic Compounds (VOC)s present in the gases. Different types of chemical gas sensors include Metal Oxide Semiconductor (MOX), optical and amperometric gas sensor, surface acoustic sensor, piezoelectric gas sensor, Silicon Carbide Field-Effect Transistor (SiC-FET) gas sensor. The sensors that measure VOCs are very complicated and are also prone to degrade from their baseline. While analyzing the response signals for pattern recognition, a deviated out-put might impact the final result (i.e., classification). The sensor characteristics that change over time is known as ‘drift’.

A sensor signal response can be divided into ‘True signal’ and ‘Disturbances’. The distur-bances consist of high and low-frequency signals. Drift is located in the low-frequency part of the disturbance. It consists of small and non-deterministic temporal variations of the sen-sor response when the latter is exposed to the same analytes under identical conditions [1]. Drift can be divided into two types based on the possible reasons. Figure 1.1 summarizes the relationship between drift and output signals from sensor.

(12)

1.2. Aim

• Real Drift: The real drift (first-order) belongs to the sensor device , also known as sensor drift. Aging is one of the reason of sensor drift.

• Second-order Drift: Second order drift belongs to the surrounding operating system. It occurs due to the operating system conditions such as temperature or humidity [2]. In real applications, it is not easy to empirically differentiate between these two. Additionally, it is challenging to correct different drift as the root for them may be unknown. However, maintaining a proper system can bypass the second-order drift effect [2].

Figure 1.1: Classification of Drift

A classification experiment was performed by DANSiC AB using SiC-FET sensor. The sen-sor was exposed to gases emitted from an experimental object. Classification labels were determined by molecular means. The molecules produced by the chemical reactions on the sensor surface are sometimes non-reversible, that is, stay on the sensor surface. As a result, the amplitude of the response changes, since there are less number of active sites for the gas response on the sensor surface. This gradual change or drift of sensor characteristics was observed during experiment. It can decrease the sensor’s selectivity and sensitivity, which results in incorrect classification. A proper compensation or correction method is required to reduce the drifting impact for achieving better classification accuracy. Forecasting drift of chemical sensors can lead to the compensation of the drift.

The object measurements have been performed by Linköping University. The data from the measurements was made available and transferred by DANSiC AB.

1.2 Aim

The proposed thesis aims to investigate first-order drift components of SiC-FET sensor as it occurs due to analyte’s chemical and physical interaction with the sensor material using time series methods. The traditional time series forecasting methods Autoregressive (AR), Moving Average (MA), Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving-Average (SARIMA), Vector Autoregression (VAR) and machine learning methods Neural Network Autoregression (NNAR), Multilayer Percep-trons (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) will be studied to find a proper forecasting model for sensor drift. The objective is to establish the best forecasting model using time series methods, validate machine learning models that use data from all time points based on the highest measures.

(13)

1.3. Research questions

1.3 Research questions

This thesis work encountered several research questions during analysis of the data. The discussion and conclusion chapter will answer the findings elaborately.

1. What are the suitable forecasting methods to predict drift of SiC-FET sensor? 2. Which time-series component characterizes drift in sensor?

3. How does outliers effect forecasting?

4. What are the effects of data reduction on drift forecasting? Which metric is relevant to evaluate forecasting drift of SiC-FET sensor?

1.4 Related Work

Time series forecasting can be categorized into traditional methods and deep neural network methods. Several methods have been proposed in the literature for forecasting, but this study is narrowed down to only forecasting of gas sensor data. Time-series methods such as Autoregressive Moving Average (ARMA), Kalman filter was demonstrated in [3] where the qualitative and quantitative analysis of that work showed that ARMA had better prediction capacity than Kalman Filter. In their work, they build the ARMA and Kalman models on the short-term time series data (collected in shorter time period i.e., a month) and forecasted the long-term time series (for longer time i.e., a year) using the trained model.

Another method, namely chaotic time series prediction of sensor drift in embedded phase space used Phase Space Reconstruction (PSR) to memorize the properties of a chaotic attrac-tor and Radial Basis Function (RBF) neural network to track the series’s movements[4]. Horelu et al. showed in [5], Recurrent Neural Network (RNN)s with LSTM are the best method for one-step and multi-step ahead prediction. Classical Hidden Markov Model for multi-step ahead prediction did not provide desired outcomes for their scenarios.

Another time series prediction based on neural network and deep learning was proposed in [6]. The reason for introducing deep learning was that the conventional methods based on shallow feature extraction can not extract non-linear features. Time series based on neu-ral networks LSTM was used to train the model and able to reduce drift in a complex environment. Authors compared the simple RNN, Gated Recurring Units (GRU), back-propagation (BP) artificial neural network (ANN) and support vector regression (SVM) with LSTM method. Authors claimed the LSTM model with mentioned parameters in their work well tracked the real sensor data in the presence of drift and achieved better results than other methods.

1.5 Delimitations

The thesis work was carried out based on a specific experimental data for classification using SiC-FET sensor. Therefore, the assumptions and applications of the analysis may not be ap-plicable to other type of sensor data. The data used in this thesis is a property of DANSiC AB and can be shared with the consent of DANSiC AB.

(14)

2 Sensor

2.1 Sensor

The SiC-FET sensor was launched by the Applied Sensor Science research unit, a part of the SAS (Sensor and Actuator Systems) division of Linköping University. SenSiC AB is commer-cializing high temperature and corrosive environment applications of the SiC-FET sensor. DANSiC AB is commercializing the SiC-FET sensor for normal room temperature applica-tions e.g., quality control of indoor air. The SiC-FET sensor is a combination of Field-Effect Transistor (FET) and a semiconductor silicon carbide. FET based gas sensors can be fabri-cated by applying a material (commonly a catalytic metal) as gate contact in traditional Metal Oxide Semiconductor Field Effect Transistor (MOSFET). The material interacts with one or more gaseous substance resulting in modulation of the gate to substrate electric field.

Figure 2.1: SiC-FET sensor

The SiC-FET sensors can detect wide range of gaseous substances with a single sensor under different operating temperatures. It has many applications for highly sensitive and selective detection of hazardous VOCs, such as H2, NH3, H2S, amines, alcohols, hydrocarbons, and other combustible gases.

2.2 Experiment Details

The chemical gas sensor used for this experiment is a SiC-FET sensor shown in Figure 2.1. The sensor was comprised of a chip with two independent FET structures. The gate of the

(15)

2.2. Experiment Details

FET structures was grounded. Moreover, the chip had an internal temperature sensor and an internal heater situated on the chip’s front- and backside, respectively. The chip, along with an external temperature sensor, was glued to a ceramic heater. The ceramic heater and external temperature sensor were used to modulate and measure the temperature in this project as the internal heater was inaccessible. Since the external temperature sensor and sensor chip were glued manually to the ceramic heater, it gave a rough indication of the sensor’s accurate temperature and varied between individual sensors. The SiC-FETs are constructed for the transistors to be used with different catalytic gate material for different applications, and the operating temperature can be up to 800°C.

The experimental setup comprised of gas supply, Mass Flow Controller (MFC), solenoid valves (1 x 2-way and 1 x 3-way valve), an arduino uno, a chemical gas sensor, an electronics board, and a measurement chamber for the object. Figure 2.2 represents the measurement setup. The solenoid valves (3-way/ 2-way valve) that control the carrier gas were installed for the object measurements. The valves were controlled by a script from an arduino to allow the flow of the air supply either directly to the sensor or first through the object chamber then on to the sensor surface.

Figure 2.2: Measurement Setup of the experiment, flow of carrier gas (synthetic air) and object gas through the valves to the sensor.

For this experiment, the sensor was exposed to gases emitted from an experimental object which stayed inside the measurement chamber made of aluminum showed as object in the Figure 2.2. When a measurement started, a synthetic air /carrier gas (80% N2 and 20% O2, total flow rate = 100 ml/min) was passed through the valves to the sensor to reach a stable baseline. The baseline acted as a reference point for the object response and measurement of sensor drift. After establishment of baseline, the synthetic air mixture was directed through the object chamber via valves towards the sensor. The process was repeated for every mea-surement.

The measurements were taken in a temperature cycle ranging from 210° to 400° C. The sen-sor responded to different VOCs at different temperatures, and its output (current) varied accordingly. The temperature ran in a cyclic order of 392°C, 362°C, 326°C, 287°C, 251°C, 213°C (actual temp) as listed in Table 2.1. It took 70 seconds to complete one cycle with a sampling frequency of 10 Hz. Each cycle was stored in a row of a matrix. The drain current (µA) of the sensor was collected as raw data. The entire period for one measurement was 1 hr 22 minutes, consisting of 71 cycles where the first 36 cycles corresponded to background air for establishing the baseline, and the subsequent 35 cycles corresponded to gases emitted from the object. Approximately ten objects were measured each day from Monday to

(16)

Fri-2.3. Gas Details

day. Total 303 objects were measured throughout the experiment. To reduce short-term drift, the synthetic air ran during nights and weekends after measurement. The sensor was also exposed to humid air for last 9 days of experiment.

Table 2.1: Temperature Cycle Temperature (°C) Time (s) 392 18 362 11 326 11 287 10 251 10 213 10 1ř time =70s

2.3 Gas Details

Synthetic Air

A mixture of 80% N2 and 20% O2 known as 0-air, supplied from a separate gas tube with total flow rate 100 ml/min was passed as a carrier gas to the sensor with the purpose of establishing a stable baseline. The baseline acted as a reference point for the response to the emissions from the object and measurement of sensor drift. This air gas was passed through the sensor during measurement, before measurement and after measurement on nights and weekends.

Object Gas

The SiC-FET sensor data used in this thesis was the drain current of sensor operated in tem-perature cyclic mode, see Table 2.1. The data is generated from the exposure of the sensor to synthetic air or emissions from the object carried by the synthetic air to the sensor surface. The emissions from the object is referred to as object gas.

Humid Air

A mixture of humid air („ 80 ´ 100%, a mixture of dry air and water vapour) was also in-troduced as reference measurements to keep track of the drift. The humid air is supposed to give a reference response level of the sensor. The reference should be stable if there was no drift. This is because the humidity level is constant and there is only one molecule, water, that is exposed to the sensor (as compared to the response from the objects, where a large amount of different hydrocarbons were involved).

(17)

3 Theory

3.1 Time Series

A set of data observed sequentially in time is known as time-series where each data point follows a sequential order. Time series analysis is a powerful tool for analyzing data charac-teristics and forecasting future based on previous observations. Examples of it’s application includes weekly interest rates, daily closing stock prices, yearly sales figures, annual precipi-tation, drought indices, hourly wind speeds, forecasting yearly sales, forecasting sensor data over experimental time, and many more diverse areas. The purpose of time series is to un-derstand the nature and characteristics of data how they change over time and predicting the unknown data of a series based on historical data of that series. Most of the equations discussed in this Chapter are referenced from [7] and [8].

Some important terms:

• White Noise: White noise is a time series generated from uncorrelated variables and considered as noise. It is denoted as wt„whitenoise(0, σw2)

• Random Walk: Random walk is a time series where the observations at time t and t ´ 1 follows a random path determined by white noise wt. The random walk can be

represented as ut=ut´1+wtwhere wtis a pure random disturbance in t „ i.i.d(0, σw2).

Based on statistical measures, time series data can be categorized into two types,

• Stationary Time Series: A time series with constant mean, variance, and auto-correlation over time is called stationary time series. A stationary time series, ythas

– the mean value function, µt=µwhich is constant and does not depend on time t, and

– the autocovariance function, γ(s,t) that depend on any time points, s and t only

through their difference |s ´ t|.

An example of stationary series is white noise.

• Non-Stationary Time Series: A time series with mean, variance, and auto-correlation changing over time is called non-stationary time series. A random walk is an example

(18)

3.1. Time Series

of a non-stationary series. It exhibits a trend and seasonality. A non-stationary time series can be written as yt =Tt+St+Nt, where Ttis trend component, Stis seasonal

component, and Ntis noise.

3.1.1 Trend

The linear or nonlinear component of a signal which exhibits a long-term variation in the data over time and does not repeat is called trend. The variation in the data can show an increasing or a decreasing trend.

3.1.1.1 Deterministic Trend

A deterministic trend produces a straight line in time, with some fluctuations. It can be ex-pressed through the linear regression model as,

yt=βt+wt (3.1)

where β is unknown fixed regression coefficient, wtis white noise. Therefore, the expected

value of deterministic trend is E(yt) =βtand variance is σ2. 3.1.1.2 Stochastic Trend

A stochastic trend is one that can change in each run due to the random component of the process. It is noisier than a deterministic trend. The stochastic trend is sometimes referred to as a random walk with drift in time series.

yt=β+yt´1+ut (3.2) where, ∆yt=yt´yt´1=β+ut (3.3) yt=y0+ t ÿ t=1 ∆yt=y0+βt+ t ÿ t=1 ut (3.4)

here, β is unknown fixed regression coefficient, yt´1is the previous value, utis random walk.

Therefore, the expected value of trends in Equation 3.4 is E(yt) =βtand variance is tσ2.

3.1.2 Seasonality

Seasonality of a time series follows a fixed and known pattern over time caused by various factors. It can be identified by regularly spaced peaks and troughs, consisting of periodic, repetitive, and generally regular and predictable patterns in the time series.

Seasonality can be tested by observing the Autocorrelation Function (ACF) plot and de-composing a time series. Based on data, seasonal pattern can be found at specific regular interval. For example, if data is collected in every hour for one month, it is possible to see a seasonal pattern every day (frequency = 24) or week (frequency=24 ˆ 7). Furthermore, data observed every minute for more than years, might have an hourly seasonality (frequency

= 60), a daily seasonality (frequency = 24 ˆ 60 = 1440), a weekly seasonality (frequency

=24 ˆ 60 ˆ 7=10080) and an annual seasonality (frequency=24 ˆ 60 ˆ 365.25=525960). [9]

(19)

3.2. Data Aggregation

3.1.3 Univariate Time Series

A time series with one variable varying with time is known as univariate time series. For example, the SiC-FET sensor produces current in response of temperature variation of 100 milliseconds interval. Therefore, one-dimensional variable that varies with time is the drain current. This thesis uses univariate time series for forecasting sensor drift, where only the drain current from previous data is used for forecasting sensor drift.

3.1.4 Multivariate Time Series

A time series consisting of more than one variable is known as multivariate time series. For example, the SiC-FET sensor has temperature input, which can be regarded as a time-dependent variable, making every data depend not only on its past drain current data but also on other variable, temperature data.

3.2 Data Aggregation

While working with an extensive data set with smaller intervals using time series, it is nec-essary to reshape the time series to coarser intervals. Data aggregation refers to the accumu-lation of all data points over a specified period. The resulting data point reflects a statistical view of the aggregated data. For example, mean, variance, minimum, maximum, or count. In this thesis, input data was collected in 100 milliseconds interval for six weeks. The size of the data was large thus it was aggregated in minutes and hours interval to reduce the computational complexity and cost.

3.3 Forecasting

Statistical forecasting models using time series are decomposition models, exponential smoothing models, and ARIMA models. Furthermore, machine learning methods and deep learning methods support automatic learning of temporal dependence and automatic han-dling of temporal structures like trend and seasonality in time series forecasting without in-terrupting Central Processing Unit (CPU) and Graphics Processing Unit (GPU) devices. The forecasting model ARIMA and LSTM were chosen for the forecasting of the sensor drift based on research studies.

3.3.1 ARMA

Forecasting with ARMA model for a time series assumes the series is stationary and the series of data ytpredicts future based on two polynomial models, AR and MA model. AR model is

used for regressing past values of the variable. MA model is used for modeling error term. The model is usually referred to as the ARMA(p, q)model, where p is the order of the AR part, and q is the MA part’s order.

3.3.1.1 AR

The AR model finds the the current value of the series, ytas a function of the past values yt´1,

yt´2, . . ., yt´p, where p determines the number of steps into the past needed to forecast the

current value. The AR(p)model can be written as,

yt=φ1yt´1+φ2yt´2+. . .+φpyt´p+wt (3.5)

where ytis a stationary series, wt „whitenoise(0, σ2w), φ1, φ2, . . . , φpare constants(φp‰0)

(20)

3.3. Forecasting

If the mean, µ of ytis not zero, replacing ytby yt´ µyields,

yt=α+φ1yt´1+φ2yt´2+. . .+φpyt´p+wt (3.6)

where α=µ(1 ´ φ1´. . . ´ φp).

3.3.1.2 MA

MA model assumes the white noise wton the right-hand side of the Equation 3.5 is combined

linearly to form the observed data. The MA model of order q is,

yt=wt+θ1wt´1+θ2wt´2+. . .+θqwt´q (3.7)

where wt„whitenoise(0, σ2w)and θ1, θ2, . . . , θq(θq ‰0)are parameters [7].

Therefore, if a time series is stationary, then ARMA(p, q)model can be written as,

yt=φ1yt´1+φ2yt´2+. . .+φpyt´p+wt+θ1wt´1+θ2wt´2+. . .+θqwt´q (3.8)

If q is zero then model becomes ARMA(p, 0)or only AR(p)model and if p is zero, model becomes ARMA(0, q)or MA(q).

3.3.1.3 ARIMA

In many situations, time series can be non-stationary, and first-order differencing may not lead to stationarity due to the presence of stochastic trends. Therefore, higher-order differ-encing is required to reach stationarity which is called as ARIMA model.

The integrated ARMA model of time series ytis said to be ARIMA(p, d, q)if

∇dyt= (1 ´ B)dyt (3.9)

or,

φ(B)(1 ´ B)dyt=θ(B)wt (3.10)

where, B is the backshift operator and the AR operator φ(B)is found from [7]

φ(B) =1 ´ φ1B ´ φ2B2´. . . ´ φpBp (3.11)

and the MA operator is

θ(B) =1+θ1B+θ2B2+. . .+θqBq (3.12)

3.3.1.4 Seasonal ARIMA

There are some cases when trend differencing may impose more seasonal behavior at some seasonal lag s. Therefore a seasonal differencing will also require for the series to be sta-tionary. With ordinary autoregressive and moving average components, additional seasonal autoregressive componentΦP(Bs)of order P and moving average componentΘQ(Bs)of

or-der Q represents an ARIMA(p, d, q)ˆ(P, D, Q)with seasonal differencing of order D. The operators are

ΦP(Bs) =1 ´Φ1Bs´Φ2B2s´. . . ´ΦPBPs (3.13)

and

ΘQ(Bs) =1+Θ1Bs+Θ2B2s+. . .+ΘQBQs (3.14)

and by differencing seasonal order D,

∇D

s yt= (1 ´ Bs)Dyt (3.15)

Therefore, the model seasonal ARIMA becomes,

ΦP(Bs)φ(B)∇D_s ∇dyt=δ+ΘQ(Bs)θ(B)wt (3.16)

(21)

3.3. Forecasting

3.3.2 LSTM

LSTM is a special type of sequence model (RNN), capable of learning long-term dependen-cies. The improvement from RNN is the modified version of the hidden layer, capable of remembering information for long time steps. In Figure 3.1 the RNN model consists of the state of inputs yt, state of outputs ˆyt, and state of hidden layers S. The weights U, W, and V

are inputs to hidden layers, from hidden layers to hidden layers and from hidden layers to output layers, respectively. [6]

Figure 3.1: A RNN hidden cell

The updated version of hidden layer in LSTM is shown in Fig 3.2. An LSTM network consists of many memory blocks called cells. It consists of input yt, the states of hidden layer ct, at

and the output ˆyt. From one cell to the next cell, cell state and hidden state information is

transferred. The three different gates are named as forget gate ft, update gate it, output gate otand are used to protect and control the cell states.

The activation function is a mathematical function that transforms input data from previous layer into a meaningful representation, which is closer to the expected output [10] .The two activation function of LSTM are sigmoid activation function and Hyperbolic Tangent Activa-tion FuncActiva-tion (tanh).

The sigmoid function is represented as,

S(yˆt) = 1

1+exp´yˆt (3.17)

The tanh function is represented as,

tanh(yˆt) = exp ˆ

yt_´_exp´yˆt

expyˆt₊_exp´yˆt (3.18)

Sigmoid function takes input values in the range of(´8,+8)and produces output in the range of(0, 1)whereas the tanh function produces output in the range of(´1, 1).

(22)

3.4. Test Methods For Stationarity Of Time Series

Figure 3.2: An LSTM cell containing the basic features

In the beginning, LSTM decides how much information will be forgotten and how much will be remembered. The forgetting is done by forget gate layer using a sigmoid activation func-tion. The activation function outputs a value between 0 and 1 for each cell state ct´1. The value 1 refers to keep all information, and 0 refers to forget all. Then, the update gate decides what new information is required to be updated in the cell state using another sigmoid activa-tion funcactiva-tion. Multiplying the itvalue with new candidate values of ctfrom tanh layer, helps to decide how much information will be updated. Finally, the output gate is used to control how much internal memory information will be transmitted as output based on the cell state. Another sigmoid layer is used to decide which parts are going to output and multiplies with the transformed cell state that was passed through tanh activation layer.

A loss function calculates the predicted data received from the output layer with the input values and computes a distance score between them. The distance signifies the performance of how the network captures a batch of input data.

3.4 Test Methods For Stationarity Of Time Series

Stationarity is a property of time series. If the data is non-stationary, the mean and variance will be undefined, and forecasting results will be biased. The correlation analysis and several statistical test are used to check stationarity of a time series in literature.

3.4.1 Correlation Analysis

Autocorrelation is the degree of correlation between different observations in a data. It ob-serves the correlation between values(y1, y2, .., yn)of the same variable at times t1, t2, ..., tn.

It also measures the linear relationship between lagged values. Autocorrelation is useful for finding periodic patterns or seasonality and trend. Therefore, ACF describes the correlation between two values in a time series, and Partial Autocorrelation (PACF) function describes the possible correlation of the time series for later lags. Based on correlation plots, it is pos-sible to claim whether the data has a trend or seasonal pattern. In a trend-stationary series the autocorrelations for small lags tend to be large, positive and has a pattern of slowly de-creasing as the lag increases. A seasonal time series contains large autocorrelations for the

(23)

seasonal lags. When a time series exhibits both trend and seasonality, the time series is called non-stationary.

3.4.2 Statistical Test Methods

3.4.2.1 Kwiatkowski–Phillips–Schmidt–Shin

Kwiatkowski–Phillips–Schmidt–Shin (KPSS) method is used to test a null hypothesis that a time series is stationary around a deterministic trend (i.e., trend-stationary) against the alter-native of a unit root. The test considers a series ytdecomposed into a deterministic trend αt,

a random walk,utand a stationary error, et,

yt=αt+ut+et (3.19)

where,

ut=ut´1+wt (3.20)

here, wtare i.i.d (0, σw2).

• H0: The stationarity hypothesis is, σw2 = 0. If it is true, the time series is stationary or

trend-stationary. It has a deterministic trend.

• H1: If the null hypothesis is rejected, the time series has a unit root, thus it is not

sta-tionary.

The KPSS test statistic around a deterministic trend is ˆητ = T´2ř S2t/s2(k), where T is the

number of observations, Stis the partial sums of errors, St=řti=1ei, here t=1, 2, ..., T and

etis the least squares residual obtained from detrended time series, ytwith either an intercept

or an intercept and a time trend. The consistent estimate of the long-run variance, s2₍_k₎_with

k lags found from the formula,

s2(k) =T´1 T ÿ t=1 e2t+2T´1 k ÿ j=1 w(j, k) T ÿ t=j+1 etet´j (3.21)

where w(j, k)denote weights, depending on a choice of spectral window. According to [11] Bartlett window was chosen for which w(j, k) =1 ´_k+1j ensures consistent estimate is non-negative. If this test statistic is greater than critical values supplied in Table 1 of Kwiatkowski et al. (1992), the null hypothesis of stationarity will be rejected at certain level of significance [11].

3.4.2.2 Augmented Dicky Fuller Test

The Dickey-Fuller (DF) test is opposite to the KPSS test where the latter checks for a stationary series around a deterministic trend and DF checks for unit root. The null hypothesis of DF tests for a unit root in the AR(1) model, and alternate hypothesis checks for stationarity. When the higher-order AR(p)is used, the extension is said to be Augmented Dickey-Fuller (ADF) test. The unit root, drift, and trend of a time series can be examined by applying three different regression equations.

Model of AR(1)can be derived as,

yt=a1yt´1+wt (3.22)

where a1is constant. Subtracting yt´1from both sides,

(24)

where,γ=a1´1

Therefore, the DF test for • Test 1 : Test for a unit root:

Unit root present in the series without considering stochastic trend and deterministic trend, implies

∆yt=γyt´1+wt (3.24)

– H0: γ=0 and Test statistic: τ

• Test 2 :Test for a unit root with drift:

Unit root is present and there is no drift in the series while considering stochastic trend, yields two test from

∆yt=a0+γyt´1+wt (3.25)

– H0: γ=0 and Test statistic: τµ

– H0: γ=0=a0and Test statistic: φ1

• Test 3 : Test for a unit root with drift and trend:

Unit root is present and there is no drift and no trend in the series while considering stochastic trend and deterministic trend implies three test from

∆yt=a0+a2t+γyt´1+wt (3.26)

– H0: γ=0 and Test statistic: ττ

– H0: γ=0=a0and Test statistic: φ3

– H0: γ=0=a0=a2and Test statistic: φ2

The difference between three regression equations is deterministic terms, namely a0and a2t.

First test is a pure random walk model where γ represents the presence of unit root. The test statistics τ, τµ, ττare the appropriate statistics to test for a t-test of the hypothesis γ=0. Having the hypothesis, γ= 0 refers to the hypothesis a1 =1, the test statistic for the test of

the hypothesis are

τ= (aˆ1´1)(Se12_c1)´ 1 2 _(3.27) τµ= (aˆ0´1)(Se22_c2)´ 1 2 _(3.28) ττ = (aˆ2´1)(Se32_c3)´ 1 2 _(3.29)

where â0, â1and â2are the least squares estimators of a0, a1and a2respectively and S_ek2 is the

appropriate regression residual mean square S_ek2 = (n ´ k ´ 1)´1[y 1 t(I ´ Uk(U 1 kUk)´1U 1 k)yt] (3.30)

here, I is an identity matrix, y1_t = (y2, y3, ..., yn) with n observations. With k = [1, 2, 3],

ck is the lower-right element of (U

1

kUk)´1 by assuming U1 = yt´1, U2 = (1, yt´1), and

U3= (1, t, yt´1)[12].

If γ is zero, then the series becomes non-stationary, and if it is far from zero, then the series is stationary. Table A Empirical Cumulative Distribution of τ from the Supplementary Manual of [8] with certain number of observations contains the critical values at certain significance level (1%, 5%, 10%). Equation 3.25 adds drift term a0and 3.26 adds both drift and trend a0

(25)

3.5. Performance Measures

φ1, φ2, φ3are used to test those joint hypotheses and are constructed similarly as normal

F-tests [13]. φi = [SSR(restricted)´SSR(unrestricted)] r SSR(unrestricted) (T´k) (3.31)

where SSR(restricted) and SSR(unrestricted) are the sums of the squared residuals from the restricted and unrestricted models, respectively. The number of restrictions is r and T is the number of usable observations and k is the number of parameters estimated in the unrestricted model. Thus,(T ´ k)refers to the degrees of freedom in the unrestricted model [8].

The calculated value of particular φi from F-test is compared with the appropriate value

re-ported in Table B Empirical Distribution of the Supplementary Manual of [8] of φ allows to determine the rejection region at certain significance level. The critical value at a certain sig-nificance level (1%, 5%, 10%) is a cut-off point. If the calculated value (test statistic) φi from

hypothesis test is smaller than the critical values that reported by Dickey and Fuller at cer-tain significance level then H0is not rejected. If the calculated value of φi is larger than that

reported by Dickey and Fuller at certain significance level, the null hypothesis is rejected.

3.5 Performance Measures

The forecasting residuals can be used for checking the performance of the forecasting model. The residuals will be uncorrelated for a well fitted data. On the other hand, correlated residu-als imply that not all information were used in forecasting. Furthermore, if the residuresidu-als have a mean other than zero, the forecast will be biased. However, the residuals are not the per-fect measures of forecasting accuracy. Some common accuracy measures based on residuals (actual values, ytand fitted values, ˆyt) are,

• Root Mean Square Error (RMSE):

RMSE=

b řn

i=1(yi´yˆi)2

n (3.32)

• Mean Absolute Error (MAE):

MAE=

řn

i=1|yi´yˆi|

n (3.33)

• Mean Absolute Percentage Error (MAPE) :

MAPE= 100% n n ÿ i=1 |(yi´yˆi) yi | (3.34)

• Mean Absolute Scaled Error (MASE) :

Scaled error , qt= yt

´yˆt 1

n´1

řn

i=2|yi´ˆyi´1|

(3.35)

MASE=

řn

i=1|qi|

(26)

3.5. Performance Measures

According to Hyndman [14] Root Mean Square Error (RMSE) gives high penalty to large errors, and the conditional mean minimizes it, whereas the conditional median minimizes Mean Absolute Error (MAE). Both RMSE and MAE are scale dependent and are not compa-rable with series from different scales. MAPE and MASE have the advantage of comparing forecast performance between different time series. However, MAPE adds more penalty to positive errors than on negative errors. When all the historical observations are equal, then MASE can be infinite. If there is no seasonality or trend, RMSE or MAE can be useful, but MASE is suitable when they are present.

(27)

4 Method

This chapter encompasses data description and preprocessing. The later sections describes how the thesis work has been carried out.

4.1 Data

The experiment was performed from 10thAugust 2020 to 18thSeptember 2020 by measuring 303 objects. Drain current (µA) response of the sensor was stored to create the raw data set. The data was stored sequentially by measurement setup and can be divided into air, emitted gas from object, nights and weekends. For each group, drain current (µA) from temperature cycled operated sensors were stored in the rows of a data frame (total 47036 rows) referring to the Table 4.1a and each column of a row were representing a data point of drain current in the temperature cycle (each temperature cycle 700 data points) in the Figure 4.1b.

(a) Data Structure (b) Cycle plot (Row plot)

Figure 4.1: Data Overview. In (a) the sensor signal provides data stored in rows with the columns representing one temperature cycle and in (b) one row or cycle plot of each group of gas types that were collected in 70 seconds.

(28)

4.2. Software Environment

4.2 Software Environment

4.2.1 Matlab

The data was collected in a mat format of several files. To extract and merge different data files into Comma-Separated Values (CSV) format matlab has been used.

4.2.2 R

Data analysis and forecasting has been done using R. R is an open-source language and soft-ware environment dedicated to statistical analysis and visualization. It supports wide range of statistical methods that are easily implementable, and it can also handle extensive calcula-tions. For better visualization and computation, few library packages have been imported. 4.2.2.1 lubridate

Lubridate package from Opinionated Collection of R Packages Designed for Data Science (tidyverse) is designed to reduce the frustration of date-time manipulation in R, which is robust to time zones, leap days, daylight saving times. The thesis work is based on time series that required precise date-time conversion with the lubridate package [15].

4.2.2.2 tidyr

The package tidyr is another package from tidyverse to sort out data by different functions named as gather(), separate() and spread(). It was used to change the data from wide to long in thesis work [16].

4.2.2.3 dplyr

The dplyr package enables the flexibility of manipulating data in R. Filtering rows, select-ing specific column, re-orderselect-ing, addselect-ing new column, and summarizselect-ing data usselect-ing different functions from dplyr allowed R users to manipulate data. The ‘split-apply-combine’ method allowed dividing and combining the data based on specific conditions used in the thesis work [17].

4.2.2.4 imputeTs

Many times analysis can be misleading due to the imbalanced data containing outliers and missing data. The imputeTs package supports various imputation algorithm to fill up the missing values in a univariate time series. Examples of algorithms include ‘Mean’, ‘Interpo-lation’, ‘Moving Average’, ‘Seasonal Decomposition’, ‘Kalman Smoothing on Structural Time Series models’, ‘Kalman Smoothing on ARIMA models’ [18]. In thesis work the imputation was applied by the ‘Kalman Smoothing on ARIMA models’ function.

4.2.2.5 xts

The package xts or the Extensible Time Series is a powerful R package that provides an ex-tensible time series class. It allows to use irregular time series and allows user to manipulate the time series based on different criteria [26].

4.2.2.6 ggplot2

The visualization package ggplot2 from tidyverse is a great tool for graphical statistic analysis and visualizing different plots [19].

(29)

4.3. Data Analysis

4.2.2.7 forecast

The forecast package supports the methods and tools for displaying and analyzing univariate time series forecasts, including exponential smoothing via state-space models and automatic ARIMA modeling [20].

4.2.2.8 tseries

Unit root test and modeling time series are supported by tseries package. It is designed for time series analysis and computation [21].

4.2.2.9 urca

The urca package consists of different kinds of unit root and cointegration test. Unit root test was used in the thesis work for stationarity, drift and trend analysis [22].

4.2.2.10 keras

The high-level neural networking methods developed in keras for R supports both convolu-tion based networks and recurrent networks. For the implementaconvolu-tion of LSTM model keras package was used in the thesis. It runs seamlessly on both ‘CPU’ and ‘GPU’ devices [23]. 4.2.2.11 tensorflow

TensorFlowis an end-to-end open-source platform for machine learning methods and nu-merical computation. It supports keras for model training using different machine learning methods. In this thesis work, LSTM model was implemented by keras and tensorflow pack-age[24].

4.2.2.12 auto.arima

The funcion auto.arima from ‘forecast’ [4.2.2.7] package returns the best ARIMA model ac-cording to unit root tests, the minimum of Akaike Information Criterion (AIC) and Maximum Likelihood Estimation (MLE) by generating the optimal values of p, d, q, P, D, Q of the time series for better forecasting.

4.3 Data Analysis

Data analysis for finding the stationarity, drift and trend was performed in raw data without any outlier extraction. Sensor data was subjected to ADF test to find sensor drift and trend

in-• Cycles : The ADF test was carried out inside one temperature cycle (a single row). A temperature cycle consists of 700 data points, and the cycle has an upward staircase cyclic trend as shown in Figure 4.1b. The decomposition of a cycle gives an upward trend and a seasonal cyclic pattern due to its temperature cycle. Therefore, 47036 cycles were subjected to the hypothesis test of stationarity (unit root), drift, trend. For this test each cycle was converted to a time series object.

• Days : The data was separated into air, object, night and weekend measurement of one day (18th August 2020 with corresponding weekend). They were then converted into time series object to test for unit root, drift, and trend. The measurement was affected by several changes (addition of humid air, changing of gas tubes) and with outliers, thus for this test, date was selected by visual inspection where the data points were in

(30)

4.4. Data Preprocessing

steady state and possibly had no outliers.

• Entire Data : Similarly, total six weeks data was converted into air, object, night and weekend time series object to test unit root, drift, and trend.

4.4 Data Preprocessing

Analysis of the raw data provided insights on the dataset, it was then preprocessed to pre-pare for forecasting sensor drift. Total data was converted to a long format and then into a series object. Referring the Section 3.1.2, a weekly seasonality was observed in time-series data since the total period of experiment was six weeks. All forecasting model used in this thesis followed outlier analysis step and removal of possible outliers. Rest of the pre-processing steps (aggregation, normalization, reshaping and train test splitting) was carried out based on model’s requirement.

4.4.1 Outlier Extraction

The time series in Figure 4.2a visualized a dip and shift in baseline in the measurements of data belonging to 12th August. These dips can be attributed to the fact that during the measurement of object 27 and 28, 1000µA button of the measurement box was not pressed. As there were many redundant information in the cycles of those measurements, forecasting was chosen to find outliers. Forecasting with these data resulted in 50% less RMSE in test set than train set. Furthermore, after removing 142 cycles from 12th_{August resulted in balanced}

forecasting. Thus, those data points were marked as outliers and imputed by ARIMA model and Kalman smoothing and were used for the estimation of the imputed data. The time series with imputed data is shown in Figure 4.2b

Furthermore, during first weekend, the valve switching led to an “object signal” (instead of air signal) in the measurements. Additionally, the measurements from 14th to 17th August were also considered as outliers by experts on the data set up. But after removing those outliers, resulted in over-fitted forecasting (higher train RMSE than test RMSE), thus those data points were not removed for further analysis.

Aug 16 2020

Aug 23 Aug 30 Sep 6 Sep 13 120 140 160 180 200 220 240 Time Dr ain Current

(a) Time Series with outlier

Aug 16 2020

Aug 23 Aug 30 Sep 6 Sep 13 200 205 210 215 220 225 230 235 240 245 Time Dr ain Current

(b) Time Series without outlier

Figure 4.2: Time series data (a) outliers observed in 12th August, (b) removed outliers and imputed the removed time series data.

(31)

4.5. Forecasting

4.4.2 Aggregation In Hours

Aggregation in hours was performed by applying a specific function (mean) to milliseconds data over a given interval of hours. An averaging window of one hour was applied over total data. Therefore, each processed data point was an arithmetic mean of 36,000 raw data points. After imputation, the sensor data were aggregated into hourly data to reduce computational cost and complexity.

4.4.3 Aggregation In Minutes

Aggregation in minutes was performed by applying a specific function (mean) to millisec-onds data over a given interval of minutes. An averaging window of one minute was ap-plied over the data. Hence, each processed data point was an arithmetic mean of 600 raw data points.

4.4.4 Train Test Splitting

Time series data splitting into train and test follows sequential order. Therefore, both ARIMA and LSTM model was trained by first 70% of data (four weeks data) and remaining 30% of the data (last 2 weeks data) was used as test set.

4.5 Forecasting

Traditional forecasting model ARIMA and deep neural network model LSTM were used for forecasting sensor drift. For each method four different models were established to find the best drift forecasting model.

4.5.1 Model Development of ARIMA

4.5.1.1 ARIMA Forecasting (Hourly Data)

The time series data is extensive due to the data interval of 100 milliseconds. To reduce computational cost and complexity, data was aggregated in hours according to the Section 4.4.2. This resulted in 943 data points with weekly seasonality (frequency=24 ˆ 7).

ARIMA forecasting requires the series to be stationary. Stationarity of a series can be tested using tests described in Section 3.4. The ACF plot of Figure 4.3 showed that smaller lags had larger and positive values. It also had a trend of slowly decreasing to low values with the increase of lags. Furthermore, a strong weekly seasonal pattern was also observed. Ad-ditionally, the data was found to be non-stationary (H0rejected as p-value < 0.05) from the

hypothesis test of KPSS and non-stationary (H1rejected as p-value > 0.05) with ADF

hypoth-esis test. Therefore, to remove the linear trend and make the series stationary, first difference (differences = 1) was taken. Differencing the trend was achieved by taking the differences between consecutive observations of a time series.

(32)

4.5. Forecasting

Figure 4.3: Non-stationary Time Series

Figure 4.4: Stationary Time Series

After transformation, the ACF plot of Figure 4.4 showed remaining correlation(s) between the observations. The ACF exhibited strong seasonality and trend over the lags, whereas the PACF had a cutoff at lag 168 (Frequency = 168). A seasonal differencing of a time series at a lag equal to the frequency removes the seasonality. Therefore, the MA(q)process order was determined from ACF plot and the AR(p)process order was determined from PACF plot. The term ‘d1 _{was calculated from differencing the trend resulted in ARIMA}₍_{p, d, q}₎ _model.

And, differencing order ‘D1_{came from the seasonal differencing.}

According to Section 4.4.4, time series data was divided sequentially into train and test set. The forecasting was executed for the next 168 hours or one week. The forecasting accuracy was evaluated using RMSE, MAE, MASE, MAPE.

(33)

4.5. Forecasting

4.5.1.2 ARIMA Forecasting (Minutes Data)

When data is aggregated in hours, both valuable and redundant information disappears. However, if data is aggregated in minutes, amount of data loss reduces. Therefore, the fore-casting of sensor drift was modelled with minutes aggregated data followed by Section 4.4.3. The forecasting followed similar approach to previous model. The forecasting was performed to predict sensor data for the next week.

4.5.1.3 ARIMA Forecasting (Temperature Selected)

Another type of ARIMA forecasting was implemented using sensor’s data based on fixed temperature point. The data was collected from 700 different temperature points. Thus, the same mixture of gas reacts differently at different temperature. In the beginning, when the sensor started reacting with high temperature (i.e., 392°C), the gas molecules reacted with the sensor residuals and the response of the sensor became unstable. However, there are few temperature points where the response data became steady and those temperature response data can represent the sensor response well. Instead of using all temperature data, it is possible to use a fixed temperature point’s data to capture drift. Furthermore, it reduces the total number of data, reduces the use of redundant information and relieves from valuable data loss incurred by aggregation. Therefore, instead of 100 milliseconds interval, 70 seconds was the interval of fixed temperature based time series.

As there were 700 temperature points, it was difficult to find the right temperature point for predicting the data correctly. However, the plateaus at the temperature cycle contains steady state referring to the Figure 4.5. The plateaus in the cycle plot corresponds to stable point where for an example temperature response at 213°C (Time = 60s) was a plateau. Thus, the drain current response from high temperature (386°C (Time = 2.2s)) and low temperature (213°C (time = 60s)) were considered to analyze sensor drift. Considering that, data did not require any aggregation and the time series interval increased to 70 seconds. Forecasting model was acquired by the similar procedure described in earlier sections.

Figure 4.5: Red circles pointing to the plateaus of a cycle plot

4.5.2 Model Development of LSTM

4.5.2.1 Normalization

Normalization is important in deep neural networks as the output data is retrieved in the range between 0 and 1 values using the sigmoid function [10]. This causes the discrepancies

(34)

4.5. Forecasting

in the prediction range for the input range of data. Therefore, a min-max normalization was followed to transform the variables into 0 to 1 for LSTM forecasting. After forecasting, the resulted data was reverted to the previous scale using inverse min-max normalization. The ithvalue of a time series, ytis y_tiwhich is normalized as follows,

y_tinorm= yti´min(yt)

max(yt)´min(yt) (4.1)

After forecasting, the ith value of the predicted data, ytpredicted_i is reverted using following

formula,

ytscaledi =yt_ipredicted.(max(yt)´min(yt)) +min(yt) (4.2)

4.5.2.2 Data Reshaping

Time series forecasting with ARIMA is a sequential supervised learning problem, but LSTM forecasting requires another pre-processing to transform the sequential supervised learning problem into the classical supervised machine learning problem. In this thesis, look-back steps of the normalized data were used as target feature using the sliding window method instead of labeled data.

Sliding Window Method : Sliding Window is a temporary approximation over the actual value of the time series data [25]. This method creates segments of the first value of a time series and merges them to the next value based on specific criteria. Each segment after the first one starts from the end of the previous segment. The process of segmentation is repeated until all data are segmented by preserving time series order. The size of the window depends on the user’s choice of specific criterion. But the first and last row have to be discarded as the previous value will be used to predict the first value in the segment.

Table 4.1: Sliding Window

Time y0 y1 1 0.6675820 0.6662079 2 0.6662079 0.6662079 3 0.6662079 0.6648338 4 0.6648338 0.6648338 5 0.6648338 0.6662079

Referring to Table 4.1, one step look-back (window size = 1) was chosen for segmentation of the time series. The LSTM network expects data in the input shape of [samples, time steps, features], where the samples are observations of time series and steps and features are set to one.

4.5.2.3 LSTM Forecasting (Hourly Data)

The modeling of LSTM network consisted of memory blocks, and those blocks were con-nected via one input layer, one drop out layer, one hidden layer, and one output layer. The model was implemented by using tensorflow and keras package. The input layer was the first layer where the input parameters were defined using batch_input_shape parameter and shufflewas set to FALSE to keep the time series order. The hidden layer was inferred from the first layer by keras. The sigmoid activation function was used in the hidden layer and output layer to retrieve the predicted data in the range of 0 to 1. Input layer, hidden layer and output layer consisted of 128, 64 and one neuron, respectively. A drop out of 50% was used for prediction. According to the Section 4.5.2.1 and 4.5.2.2, data reshaping and data

(35)

4.5. Forecasting

normalization was performed. Referring to Section 4.4.4, normalized data was divided into train and test set. As, the input for the model training were normalized therefore, the pre-dicted result were reverted using inverse min-max normalization. The forecasting accuracy was evaluated by RMSE and MAE function from keras. The forecasting was performed for predicting sensor drift of next hour.

4.5.2.4 LSTM Forecasting (Minutes Data)

Similarly, another LSTM model using the minutes aggregated data described in Section 4.4.3 was build to forecast the sensor’s minutes aggregated data. In this case, the forecasting was carried out to predict sensor data for the next minute.

4.5.2.5 LSTM Forecasting (Temperature Selected)

Two other LSTM model were built as explained in Section 4.5.1.3 using 389°C and 213°C temperature data. No aggregation was applied on those data. The data followed outlier ex-traction, normalization. The model followed data reshaping and train test splitting before training the LSTM forecasting models. The parameters and structure of the layers were sim-ilar as described above. This time the forecasting was performed to predict next sensor data point in 70 seconds.

(36)

5 Results

5.1 Data Analysis

5.1.1 Cycles

The time series analysis (ADF) test was applied to find drift (stochastic trend) and determin-istic trend in the cycles of SiC-FET sensor data. If drifting starts in between observations of a single cycle, that amount of drift will add to the next cycle which could contribute to the overall drift. Therefore, 47036 cycles were tested for stochastic and deterministic trend but only few cycles from each series are presented here to demonstrate.

Table 5.1: Deterministic and Stochastic Trend in a Cycle Gas Type Cycle Type of Trend test statistic Air 35574 Stochastic 0.378 < ct Deterministic 46.769 > ct Object 19827 Stochastic 0.376 < ct Deterministic 47.896 > ct Night 43164 Stochastic 1.129 < ct Deterministic 99.514 > ct Weekend 33087 Stochastic 1.135 < ct Deterministic 90.593 > ct

Hypothesis test of drift in all cycles was performed using Equation 3.25. Table 5.1 represent-ing the obtained results from the hypothesis test of unit root with no drift in cycles. The rejec-tion of alternate hypothesis of having drift in most of the cycles were found from the results of 47036 cycles. For example, air cycle ‘35574’ had no support for drift as test statistic (0.378) did not exceed the critical values at 5% significance level. Likewise, object cycle ‘19827’, night cycle ‘43164’ and weekend cycle ‘33087’ had less extreme test statistic (φ1=0.376, φ1=1.129,

φ1=1.135 respectively) than the critical value (at 5% significance level), thus we can not

re-ject the null hypothesis (H0= unit root with no drift). This implies, the cycles did not exhibit

(37)

5.1. Data Analysis

On the other hand, for aforementioned cycles, the trend test using Equation 3.26 resulted in rejection of null hypothesis (H0= unit root with no drift and no trend). The test statistics for

air, object, night and weekend cycles (φ2 =46.769, φ2 =47.896, φ2 =99.514, φ2 =90.593

re-spectively) were outside the critical values at 5% significance level. Therefore, they exhibited a deterministic trend.

Table 5.2: Summary of cycles

Drift/Trend Yes No

Is Drift 16 47020

Is Trend 47036 0

However, there were drift in 16 air cycles. Rest of the cycles did not exhibit any drift.

5.1.2 Days

A day was chosen to test for drift and trend in overall measurements of that day. Referring to the Section 4.3, the hypothesis test was performed using Equations 3.25 and 3.26 among air, object, night, and weekend data.

Table 5.3: Deterministic and Stochastic Trend in One Day Gas Type Type of Trend test statistic

Air Stochastic 21.25 > ct Deterministic 14.18 > ct Object Stochastic 17.89 > ct Deterministic 12.02 > ct Night Stochastic 19.13 > ct Deterministic 12.85 > ct Weekend Stochastic 137.84 > ct Deterministic 92.31 > ct

According to the Table 5.3, the ADF test for stochastic trend (drift) of air series resulted in rejection of the null hypothesis (H0 = unit root with no drift) as test statistic (φ1 = 21.25)

exceeded the critical values. Thus air series had drift. Similarly, object, night and weekend series also exhibited drift as test statistics (φ1 = 17.89, φ1 = 19.13, and φ1 = 137.84

respec-tively) were more extreme in the direction of the alternative hypothesis than the critical value. Furthermore, the test for deterministic trend resulted in rejection of null hypothesis (H0 =

unit root with no drift and no trend) based on test statistic values of 14.18, 12.02, 12.85 and 92.31 respectively for air, object, night and weekend. All of the test statistics exceeded the critical values at 5% significance level. Therefore, the series exhibited both stochastic and deterministic trend. Thus, the sensor started to experience drift in daily measurements.

5.1.3 Entire Data

The ADF test for drift using Equation 3.25 and trend using Equation 3.26 were carried out on six weeks data (separately air, object, night, and weekend).

Investigation and forecasting drift component of a gas sensor

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Statistics and Machine Learning

2021 | LIU-IDA/STAT-A--21/004—SE

Investigation and forecasting drift

component of a gas sensor

Farhana Chowdhury Tondra

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

Glossaries

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research questions

1.4

Related Work

1.5

Delimitations

2

Sensor

2.1

Sensor

2.2

Experiment Details

2.3

Gas Details

Synthetic Air

Object Gas

Humid Air

3

Theory

3.1

Time Series

3.1.1

Trend

3.1.2

Seasonality

3.1.3

Univariate Time Series

3.1.4

Multivariate Time Series

3.2

Data Aggregation

3.3

Forecasting

3.3.1

ARMA

3.3.2

LSTM

3.4

Test Methods For Stationarity Of Time Series

3.4.1

Correlation Analysis

3.4.2

Statistical Test Methods

3.5

Performance Measures

4

Method

4.1

Data

4.2

Software Environment

4.2.1

Matlab

4.2.2

R

4.3

Data Analysis

4.4

Data Preprocessing

4.4.1