Machine Learning to identify aberrant energy use to detect property failures

(1)

IN

DEGREE PROJECT ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2020,

Machine Learning to identify aberrant energy use to detect property failures

SHAHROZ HABIB

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT

(2)

Master of Science Thesis Department of Energy Technology

KTH 2020

Machine Learning to identify aberrant energy use to detect property failures

TRITA: TRITA-ITM-EX 2020:595

Shahroz Habib

Approved

23-Nov-2020

Examiner

Hatef Madani

Supervisor

Nelson Sommerfeldt

Industrial Supervisor

Amanda Fors

Contact person

(3)

2

Abstract:

The digitalization of energy sector has provided immense amount of data about buildings which created an untapped opportunity for energy savings using energy data analytics. In recent years, there has been significant research on energy optimization using machine learning. With the advancement in deep neural networks, researchers have investigated the potential of using time series machine learning algorithms to develop sophisticated energy prediction and proactive alert systems for energy management. In this thesis, we aim to explore utility of time series machine learning algorithm for anomaly detection to alert customer about abnormal energy consumption.

In our quest to find effective anomaly detection technique, we researched on several time-series anomaly detection techniques and selected long short-term memory (LSTM) network due to popular implementation and current scientific research interest. Our results indicate linear regression has achieved better prediction with MSE around 0.066 kWh compared to LSTM with 0.073 kWh. In terms of anomaly detection, baseline persistence has detected all five types of anomalies with average precision of 45.4% and average recall of 36.4%. Meanwhile, LSTM only detected two out of five anomalies with average precision of 100% and average recall of 17%. Therefore, investigation has shown great promise in use of persistence and linear regression models for anomaly detection due to simplicity and accuracy.

(4)

3

Sammanfattning:

Digitaliseringen av energisektorn har genererat enorma mängder data om byggnader vilket skapat en outnyttjad möjlighet för energibesparingar med hjälp av energidataanalys. De senaste åren har det gjorts betydande forskning om energioptimering med maskininlärning. Med utvecklingen av djupa neurala nätverk har forskare undersökt potentialen med att använda tidsserier och maskininlärningsalgoritmer för att utveckla sofistikerade energiprognoser och proaktiva varningssystem för energihantering. Denna uppsats avser att undersöka nyttan av maskininlärningsalgoritmer med tidsserier för detektering av avvikelser, för att varna kunder om onormal energiförbrukning.

För att hitta en effektiv algoritm för detektering av onormal energiförbrukning utfördes en litteraturstudie över flera existerande tidsserie-baserade tekniker för upptäckt av anomalier. LSTM-nätverk (Long Short- Term Memory) valdes då den använts i flertalet studier och väckt mycket intresse inom forskningsfältet.

Resultatet indikerar att linjär regression åstadkommer en bättre förutsägelse med MSE (Mean Squared Error) runt 0,066 kWh jämfört med LSTM med 0,073 kWh. Gällande anomalidetektering visar test av uthållighet vid baslinjen en upptäckt av alla fem typer av anomalier med en genomsnittlig precision på 45,4% och en genomsnittlig återkallelse på 36,4%. Samtidigt upptäckte LSTM bara två av fem anomalier med en genomsnittlig precision på 100% och en genomsnittlig återkallelse på 17%. Denna studie har därmed demonstrerat lovande resultat för användningen av uthållighet och linjära regressionsmodeller för identifiering av avvikelser.

(5)

4

Acknowledgements:

I am grateful to Magnus Astner for his time, feedback, and genuine kindness. His guidance during times of struggle was essential in completing this thesis. I would like to thank my supervisor, Nelson

Sommerfeldt, for his inputs and ideas for the thesis as well as the future. My warmest regards to Amanda Fors and Gustav Stenbeck for giving me the opportunity to do this thesis at Mestro and their support. I would also like to extend my gratitude to KTH PhD candidate Sahar Imtiaz for guidance on machine learning and overall engagement during the project. I am very grateful to my examiner, Hatef Madani, for his valuable feedback. I would like to appreciate the contribution of my opponents,

Muhammad Ammar Khan and Naveen Fatima, for their critique of my work. Special thanks to my family and friends for their continuous encouragement and motivation.

Thank You!

(6)

5

Introduction:

The increasing global warming and need for energy conservation has led to extensive focus on building energy management. Data Analytics has introduced phenomenal progress in building energy services with use of big data and machine learning. By using these tools, we can work on various aspects such as energy efficiency, fault detection and prognosis, energy optimization etc.

Anomaly detection has emerged as an important topic in building management as it allows the building services to identify the abnormal energy consumption pattern, alert customers about their energy behaviors and recommend preliminary measures to reduce energy consumption.

Anomalies can occur from excess energy usage due to faulty equipment, bad energy habits or meter malfunctions. These anomalies, if identified, can enable customers to save money and time with timely actions.

Anomalies are defined as rare data point which is different from normal data. Common definition used in statistics and data mining literature is as follows: “An anomaly is observation which deviates so much from other observations as to arouse suspicion that it was generated by different mechanism.”

There are four different categories of anomalies:

1) Point anomalies: A single instance of data that deviate from the rest of data. They are simplest type of anomalies found in most data domains.

2) Contextual anomalies: A data instance is said to be contextual anomalies if they are considered anomalous in specific context but normal otherwise. For example, low temperature during winter is fine but low temperature during summer will be considered anomaly

3) Collective anomalies: A set of data instance that are normal when viewed individually but

anomalous when considered together. These anomalies are prominent in time series sequences.

4) Change points anomalies: This type or anomalies are specifically found in time series data when sudden changes in pattern are observed. However, not all change points are considered

anomalies.

Developing anomaly detection methods have proven to be difficult as describing anomaly is a challenge.

The lack of labeled dataset makes it difficult to test and evaluate anomaly detection systems as it is time-consuming to perform manual labeling during the pre-processing phase of anomaly detection.

In time series domain, there is a close relationship between prediction and anomaly detection. The method for anomaly detection shows how to transform an instance of outlier detection into a prediction problem. Outliers are a deviation of normal model of input features and predictive models can help in modeling these features. Any violation of predicted features will represent violation of normal data model and correspond to an outlier.

Deep learning methods are gaining attention due to their ability to handle complex, non-linear models.

Artificial neural network can learn high-level representation from raw data with minimal need for manual feature engineering and domain expertise. In the domain of time series and sequential data, recurrent neural network (RNN) is gaining popularity due to its ability to recognize patterns. A variation of RNN is called long-short term memory (LSTM) where long-range pattern in time series are recognized.

(8)

7

Research Question:

How can anomalies be detected in electricity consumption time-series using machine learning?

Scope of the problem:

1) The study will only focus on short-term anomaly detection with one-step ahead time horizon 2) Thesis scope will be limited to linear regression and sequential neural network i.e. LSTM 3) Features in thesis has been restricted to electricity demand

4) The study is limited to investigating anomaly detection on electricity time series data from 50 datasets which provides significant room for statistical validation

Thesis outline:

This thesis is structured in chapters following an empirical thesis outline. Chapter 2 gives background about anomaly detection and is divided into two sections: relevant theory for understanding the project motivation and related literature review. Chapter 3 represents the methodology for proposed anomaly detection method and detailed algorithm steps. Chapter 4 represents results obtained from

implementation of our project. This is followed by analysis and discussion on results, limitations, and possible future work in Chapter 5. Finally, we develop conclusions from our work in chapter 6.

(9)

8

Background:

Anomaly detection is important feature in building energy management system, which can optimize building energy consumption through detailed insight about consumer energy signature and expected faults within the energy system. There has been significant research in anomaly detection, particularly in electricity consumption using statistical and computational approaches. The discussion about related work in anomaly detection has been detailed below. Afterwards, relevant theory has been described to set stage for anomaly detection algorithm using LSTM.

Related Work:

Anomaly detection has been subject of interest and various statistical, computational, and physical techniques have been used. Initially, the anomaly detection has been performed using statistical technique called autoregressive integrated moving average (ARIMA) model. In one study, ARIMA model was compared with different multi-variate regression models on the building energy consumption model. The author has identified two anomalies which are values beyond 95% confidence interval in energy data using ARIMA model. Accordingly, ARIMA performed better than other regression models in the study conducted by IBM. [1]

Later with further progress in computational techniques, machine learning methods such as clustering and regression have been used for anomaly detection. One of the famous clustering methods is called k- nearest neighbor (k-NN) which was used for energy anomaly detection. In one study, anomaly detection was performed using k-NN on electricity consumption data and the author was able to achieve anomaly detection rate around 88%. With introduction of neural networks, researchers tried to use artificial neural network (ANN) for anomaly detection. In one such paper, ANN was compared against other regression methods and ANN achieved lowest root mean squared error (RMSE) around 0.088 kWh. [2]

In another work, classification based neural network was used for anomaly detection and it achieved accuracy around 90% for two-class and 70% for seven-class anomaly detection algorithm. [3] Similarly, in another work, ANN has performed well with anomaly detection and achieved RMSE around 0.65 kWh under domain of electricity theft detection. [4] Moving further, researchers have tried to combine different ANN models to improve accuracy and precision for better anomaly detection and this approach is termed as ensemble anomaly detection (EAD). In one study based on electricity data anomaly

detection, EAD model was able to achieve false positive rate of 1.98% and true positive rate of 98.1% as compared to classical autoencoder based ANN model. [5]

With improvement in neural network design in ANN, recurrent neural network (RNN) has emerged in field of anomaly detection and it serves as foundation for LSTM network which is popular for sequential time series-based anomaly detection. In 2019, a study was conducted using LSTM and it was compared against ARIMA and the author concluded that combined LSTM-ARIMA model i.e. ensemble model provided higher accuracy around 92.1% and true positive rate around 69.1%. [6] Moving further, researchers have compared bi-directional LSTM (bi-LSTM) was with support vector machine (SVM) prediction model to identify abnormal electricity consumption. SVM is common method in machine learning for anomaly detection so the study compared the proposed LSTM method with SVM and normal LSTM. The author found that average accuracy of bi-LSTM was around 91.8% while it was around 91.1% and 85.5% for SVM and LSTM, respectively. [7]

(10)

9

In 2019, thesis study on anomaly detection using LSTM was performed on Swedish electricity data acquired for single building dataset from 2015 to 2019. The LSTM performance was compared against feed forward neural network (FFNN) and the author concluded that LSTM model is more useful for detecting contextual anomalies while FFNN model is more relevant for detecting global anomalies.

Overall, LSTM achieved precision around 60.9% and recall around 67.9% for electricity anomaly

detection. However, the author stated about lack of statistical validation due to few labelled datasets.[8]

Accordingly, our study is based on multiple building datasets in Swedish electricity market for different sectors i.e. residential, commercial, and industrial etc. which provides sufficient room for statistical validation of LSTM anomaly detection performance. Furthermore, LSTM has been compared with persistence model in weather and solar PV prediction. [9] However, such study is not available for electricity consumption applications in Swedish electricity market which makes our study unique with reference to current literature.

Related Theory:

According to literature review, various machine learning methods have been used for anomaly detection such as support vector machine, neural network, linear regression, symbolic aggregate approximation, and LSTM method. Under these methods, LSTM has proven to be very popular neural network

technique for sequential dataset with temperature and energy consumption.

Neural network are layers of computational units which are called neurons with connections in different layers. The network transforms data until they can classify it as an output. Each neuron multiplies initial value by some weight, sums results with other values coming into same neuron, adjust resulting number by neuron’s bias and normalize output with activation function.

Key feature of neural network is iterative learning process in which rows presented to network one at a time, and weights associated with input values are adjusted each time. During the learning phase, network trains by adjusting weights to predict class label of input samples.

Basically, the network processes the records in training dataset one at a time, using weights and functions in hidden layers then compares the resulting outputs against the desired outputs. Errors are propagated back causing the system to adjust the weights for next record. A chunk of neural network looks at some input xt and outputs a value ht as shown in fig. 1.

Figure 1 Single Neuron Layer

Typical neural network works from scratch with updated weights for next record and output result.

However, the network does not have memory of last output within hidden layer as each record is new

(11)

10

and the process becomes iterative. However, recurrent neural network addresses this issue by having networks with loops in them allowing information to persist.

Recurrent neural network has loop which allows information to be passed from one step of the network to the next. These loops make recurrent neural network seem kind of mysterious but if we unenroll the loop as shown in fig. 2, recurrent neural network can be thought as multiple copies of same network, each passing message to successor. This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists.

Figure 2 Expanded Recurrent Neural Network

As shown in fig.2, the loop is just series of neurons where weights information from hidden layers in previous run is input in next hidden layer and the input data is inserted until the final input xt results into output ht. This chain-like nature reveals that recurrent neural networks are intimately related to

sequences and lists. They are the natural architecture of neural network to use for such data. Recurrent neural network has good success in sequential data training.

However, recurrent neural network suffers from short-term memory. If a sequence is long enough, they will have hard time carrying information from earlier time steps to later ones. In normal situation where we only need to look at recent information, RNN works well but for long-term dependencies, RNN does not seem able to learn them.

LSTM are special kind of RNN which are capable of learning long-term dependencies. All recurrent neural network has form of chain of repeating modules of neural network. In standard RNNs as shown in fig. 3, repeating module will have very simple structure such as single tanh layer. However, in LSTM, instead of having single neural network layer, there are four layers interacting in special way as shown in fig. 4.

(12)

11

Figure 3 Standard RNN hidden neural layer Figure 4 LSTM hidden neural layer

The key to LSTM is the cell state, horizontal line running through top of the diagram. The cell state is like conveyor belt which runs straight down entire chain with minor linear interactions. The LSTM does have the ability to add or remove information to the cell state carefully regulated by structures called gates.

Gates are way to optionally let information through. They are composed of sigmoid neural net layer (σ) and pointwise multiplication operation. The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through where zero means nothing to pass and one means to let everything through.

LSTM has three of these gates to protect and control the cell state.

Step by Step walk through LSTM:

Firstly, we are going to decide what information we are going to throw away from the cell state. This decision is made by forget sigmoid layer(ft) called “forget gate layer”. It looks at “ht-1” and “xt” and outputs a number between 0 and 1 for each number in the cell state Ct-1. A 1 represents “completely keep this” and 0 represents “completely get rid of this”. As shown in eq. 1, the forget gate layer ft is sigmoid layer (σ) of input with weight (wf) applied to input[ht-1,xt] and y-intercept bf.

In example of language model trying to predict next word based on all previous ones. In such a problem, the cell state might include gender of present subject so that correct pronouns can be used. When we see a new subject, we would like to forget the gender of old subject.

f

t

= σ (W

f

. [h

t-1

,x

t

] + b

f

)

(1)

Next step is to decide what new information we are going to store in the cell state. This has two parts.

First, sigmoid layer called input gate layer(it) decides which values we will update. Next, a tanh layer creates a vector of new candidate values(Ĉt) that could be added to the state. In the next step, we will combine these two to create an update to the state. As shown in eq. 2, the input gate layer (it) is output of sigmoid layer (σ) where input is weight (wi) applied to input [ht-1,xt] and y-intercept (bi). Furthermore, in eq. 3, new candidate values (Ĉt) are output of tanh layer (tanh) where input is weight (wc) applied to input [ht-1,xt] and y-intercept (bc).

In the example of language model, we would want to add the gender of new subject to cell state, to replace the old one. It is now time to update old cell state Ct-1 to new cell state Ct. The previous steps already decided what to do, we just need to actually do it.

i

t

= σ (W

i

.[h

t-1

,x

t

] + b

i

) (2)

(13)

12

Ĉ

t

= tanh(W

C

.[h

t-1

,x

t

] + b

C

) (3)

We multiply old state by ft, forgetting the things we decided to forget earlier. Then we add it*Ct. In case of language model, we would drop the information about old subject’s gender and add the new

information. As shown in eq. 4, new cell state (ct) is based upon sum of forget gate output from last cell state (ct-1) and input gate output from new candidate vector (Ĉt).

C

t

= f

t

* C

t-1

+ i

t

* Ĉ

t

(4)

Finally, we need to decide what we are going to output. The output will be based on our cell state but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we are going to output. Then, we put the cell state through tanh and multiply it by output of sigmoid gate to output only decided parts. As shown in eq. 5, output (ot) is output of sigmoid layer where input is weight (wo) applied to input [ht-1,xt] and y-intercept (bo). As shown in eq. 6, ht is final output which is based upon product of sigmoid layer output (ot) and tanh layer output.

O

t

= σ(W

o

.[h

t-1

,x

t

] + b

o

) (5)

h

t

= O

t

*tanh(C

t

) (6)

(14)

13

Methodology:

This chapter explains the procedure that will be followed to develop the anomaly detection algorithm for electricity demand data. Initially, the datasets used in the thesis are presented to show the pattern and anomalies found in the datasets. Then, we explain overall approach used for anomaly detection from labelling until evaluation. Then, we discuss about mathematical labelling of true anomalies and manual labelling. Then, we present our proposed anomaly detection method in steps. Then, we discuss about baseline model and linear regression model. Finally, we discuss briefly how LSTM is implemented in Keras and our process for optimization.

Data pre-processing:

In our study, the electricity consumption data for various buildings was acquired through energy startup called Mestro which is working on visualization of energy data for real-estate owners in Sweden. In Mestro database, we have 100+ customers with electricity consumption data of more than 5 years. Within these customers, there are multiple buildings with several meters. With customer’s total portfolio, around 9 million measured values are collected per day. Accordingly, there is large amount of energy consumption data available for several buildings.

However, the database has high resolution data such as values of energy, temperature on hourly basis as well as metadata section which consists of meter ID information etc. The nature of the data is sequential, or time series based which will help us to identify relevant machine learning algorithm for anomaly detection.

Lack of labelled dataset is common issue with evaluation of anomaly detection algorithm. Accordingly, synthetic datasets are used. The problem with synthetic dataset is that performance on synthetic dataset cannot ensure good accuracy on real dataset. However, in our case, application specific

algorithm is being developed which will detect anomalies in electricity consumption data. Hence, we will use actual electricity timeseries data for manual labelling of anomalies and develop customized anomaly detection algorithm for electricity consumption dataset.

Mestro Building Energy Consumption dataset:

Within building dataset available, we have selected a building with five years of hourly data for energy and temperature time series. The dataset has anomalies which will be identified after developing threshold value based on abnormal energy consumption. The issue of lack of labelled abnormal dataset creates difficulty in defining threshold value for anomaly detection.

Each building consumption dataset contains 43,800 rows of hourly energy consumption and temperature from Thursday 1^st Jan 2015 to Tuesday 31^st Dec 2019. A week in data represent 168 observations with peaks indicating high power consumption on weekdays and no peaks showing low consumption on weekends. There are three types of seasonality in the data- a daily, weekly, and annual pattern that resembles with most real-world energy consumption.

(15)

14

Figure 5 Daily Energy Profile

Figure 6 Weekly Energy Profile

Figure 7 Annual Energy Profile

(16)

15

Anomaly Detection Process:

After explaining the nature of dataset and anomalies, it is crucial to explain the framework used for thesis study starting from labelling of true anomalies leading to unsupervised model training to identify anomalies and evaluation of model by comparing identified anomalies against true anomalies. This simple approach is used for thesis study and it has been detailed in figure 8 below.

Figure 8 Anomaly Detection Flowchart

True Anomaly Definition

•Mathematical definition

•Automated identification

Model Training

•Prediction curve training

•Supervised or Unsupervised

Evaluation

•Anomaly Detection

•Metric Development

(17)

16

The first step for anomaly detection is to identify anomalies and label them. This issue of lack of labelled dataset makes it difficult to compare results and has been identified as major issue in anomaly detection.

[8] However, it is very important to label anomalies correctly because it sets the standard for evaluation of the model. Due to the importance of labelling anomalies, literature review has been conducted and five types of anomalies have been identified which are discussed in next section. [8][11]

After labelling true anomalies, the relevant anomaly detection model in machine learning or statistical technique is selected. There are two categories for machine learning model based on use of true anomalies. Supervised models learn from true anomalies and the model is trained with labelled dataset which allows the model to predict day-ahead energy consumption and identify anomalies in testing dataset with better accuracy due to availability of labelled training dataset. However, unsupervised models learn the pattern without knowing true anomalies, so these models are trained using un-labelled datasets. In this study, unsupervised machine learning models are used which predict day-ahead energy consumption using historical dataset. Then, anomaly detection is performed by comparing predicted energy values against true energy values to identify error values.

The final step for anomaly detection is related to comparison of identified anomalies with true

anomalies using standard metrics and comparison of anomaly detection performance with literature for evaluation of model performance and final selection.

Anomaly Window Brief:

Before diving into the process, anomalous window which has been used for visualization needs to be understood.

Figure 9 Anomaly Window Visualization

The plot above contains energy data values over a month containing 720 datapoints as shown on x-axis.

The y-axis label shows normalized value of energy data which is centered around zero and scaled to remove the variance. Power is normalized using z-score based on mean and standard deviation. The red center line is the mean of energy values of test data. Alternately, the blue line is one standard deviation away from mean of test data. The gray line is two standard deviation away from mean of test data. The orange line shows the anomaly label. However, the width of orange line is based on window size and window size is based on two timesteps at least which can cause normal energy points to be considered as anomalies as well.

Identification and labelling of anomalies in dataset:

In the current dataset, the weekends are shown to have low consumption like Saturday and no consumption on Sunday. These types of pattern are known as periodic which occur with weekly frequency. Furthermore, there is monthly or seasonal variation which occurs from spring to fall with

(18)

17

annual frequency with higher consumption during fall or lower consumption during summer. These periodic patterns constitute as important part of time-series pattern recognition which is fundamental for anomaly detection.

Due to absence of labelled anomalous data, we have used some data analysis to identify anomalous events for energy consumption. The typical energy consumption has two types of anomalies for the dataset – point and contextual anomalies.

The point anomalies in the time series are referred to as points where the magnitude of power demand is substantially higher than average energy consumption. These peak anomalies can occur due to special occasion or any device malfunctioning which can lead to excess energy usage. These anomalies can alert user about expected increase in energy billing due to casual energy use or device malfunction.

Figure 10 Peak Energy Anomaly

Another category for point anomalies is zero energy values where the energy consumption is not measured due to flaw in meter or discontinuation of meter subscription. These anomalies can alert energy provider about expected loss of revenue or malfunctioning meter which can be repaired timely through detection of zero energy anomalies.

Figure 11 Zero Energy Anomalies

The contextual anomalies in the time series are normal values with abnormal pattern when seen over the specific period. The energy values under this anomaly follow irregular pattern which is unique in sequential dataset with deviation from mean energy values. Such values which deviate from baseline

(19)

18

energy values are known as baseload shift anomalies. In such condition, the values shift by specific amount for the given period.

Figure 12 Baseload Shift Anomaly

Another type of contextual anomaly is related to irregular days which deviate from periodic pattern of weekend and weekdays. There are two such anomalies which can be considered on weekly basis. In normal week, there are five days for high energy consumption typically in office buildings with low energy consumption on Saturday or Sunday. However, there are some public holidays or special events which occur during weekdays and lead to low energy consumption. It is good idea to identify such anomalies especially when such low energy weekdays are not public holidays.

Figure 13 Weekday Energy Anomaly

Similarly, there are some weekends when there is higher energy consumption than usual. These weekends can be considered as anomalies. It might occur due to special event or sudden breakdown during night which might cause energy wastage. These anomalies can allow the user to identify sudden faults during weekends and alert them about possible repair requirement before the start of next week.

Figure 14 Weekend Energy Anomaly

After identifying these five categories of anomalies, the manual labelling of anomalous dataset was performed.

(20)

19 Dataset preparation:

Dataset preparation is very important step before selection of machine learning model. In our study, the data for a building was taken from January 2015 until December 2019. The data is normally split

between training (Tr), validation (V) and testing (T) dataset on monthly basis. As an example, if T is taken from October 2019, then V is taken from October 2018 and Tr is taken from October 2017, respectively.

The sequence of data is not shuffled to maintain temporal dependencies which is required for training time-series models such as LSTM. Furthermore, dataset has been limited to monthly sample of 720 values after several iterations where the training data selected varied from 3 years (25,920 datapoints) to 1 month (720 datapoints). The model training score and validation loss is used for model accuracy evaluation where training score is prediction error achieved after training and validation loss is loss obtained when validation dataset is implemented on model. The model with low value of training score and validation loss is better in terms of prediction. In order to evaluate, a single building was taken where model training score was around 0.053 kWh and validation loss was around 0.065 kWh whereas for three years, model training score was around 0.031 kWh and validation loss was around 0.026 kWh, respectively. Although the training score for 3 years was lower than one month, but difference is negligible as low error score is also achieved using one month. Accordingly, Tr and V has been limited to 1 month (720 datapoints).

(21)

20 Model Selection:

In order to evaluate the model as complicated as LSTM, it is essential to first develop baseline to

compare it against neural network-based model. Accordingly, simple persistence method has been used as our baseline to compare subsequent models.

Baseline Persistence

The persistence model is developed using the hourly values from last week at same time, so the prediction of the model is based on last week values.

E(t)

pred

= E(t-168)

actual

(8)

Figure 15 Comparison of models - Predict vs. Actual

As shown in the superimposed curve plot above, the original weekend over-shoot which occurred on 6^th Oct 2018 has been predicted one week later at 13^th Oct 2019 which shows that persistence model is working well.

Linear Regression

After defining baseline, simple regression model has been developed before moving towards artificial neural network to compare performance of simple regression model against advanced regression based neural network model i.e. LSTM.

Regression is based on estimating the relationship between energy (kWh) as dependent variable as a function of various independent variables. The regression coefficients have been calculated for various dependent variables to identify best features which can define energy consumption. Since our thesis scope is limited to single parameter i.e. energy values, different time-based variables have been tested such as hour of the day, day of the week. However, the most relevant independent variable has been the day of the week.

Due to low regression coefficient, additional independent variables have been investigated using energy parameter such as last week energy values, mean of energy values, last fortnight energy values, last hour energy values. After evaluation, last week energy values showed highest regression coefficient.

Accordingly, day of the week, hour of the day and last week energy value has been selected after multiple iterations. The regression coefficients for each parameter has been shown below. Due to their low values compared to last week energy, it can be assumed that time-based features alone are not sufficient to predict energy consumption using linear regression.

Y= b

1

x

1

+ b

2

x

2

+ b

3

x

3

(9)

(22)

21

where x1= hour of the day, x2= day of the week, x3= last week energy value and b1, b2, b3 are regression coefficients described in table below for each variable x respectively.

Figure 16 Correlation plot (Variables vs. Energy)

Feature Regression Coefficient (b)

Hour -0.002

Weekday -0.019

Energy (t-168) 0.514

Table 1 Regression Coefficients for energy (kWh)

After development of baseline persistence and linear regression models, LSTM based neural network model has been developed to predict energy values for anomaly detection.

Long Short Term Memory (LSTM)

Due to time-series nature of LSTM, the model training data requires sequential dataset based on windows of input energy values to predict series of output energy values. The length of time frame for energy values which are looked back in time is known as ‘lookback’ period and it defines the window size of input data for the neural network. During our study, lookback has been changed from 3 (last 3 hours) to 168 (last week).

When the window of input data is fed to the neural network, the energy values are predicted as a window of output values so model can predict from one time-step ahead or one week ahead values in output window. Hence, the model can look ahead several values, so the output window size is termed as

‘lookahead’ period. During our study, lookahead is kept to one time-step ahead.

Batch size is defined as chunk of training windows fed to the neural network in one run. Training dataset for one month (720 windows) with batch size of 72 can lead to single batch containing 72 windows and one complete train dataset run is completed in 10 batches (720=72*10). Epoch is defined as one complete run over entire training dataset and one epoch is completed after 10 batch runs in this case.

After windowed dataset preparation and defining the input shape for LSTM, the model training is configured using hyper-parameters such as no. of epochs, patience, optimizer type, dropout rate, no. of hidden neural layers, loss function. Once LSTM neural layer is configured, then model fitting is

performed on training dataset and various callback functions are used to navigate the training process of the model. Hyper-parameters are very important for successful LSTM model training.

(23)

22

Due to limitation of time, the optimal configuration selected is based on LSTM anomaly detection study which was conducted on Swedish energy data. [8] According to that paper, the following configuration has been used for stateless LSTM. However, due to Keras limitation, lookback period has been limited to 3 to achieve batch size around 67 for successful implementation of stateful LSTM. Accordingly, different configuration has been used for stateful LSTM.

The summary of model configuration for all four models is shown below.

Network Architecture

Parameters Lookback Batch Size Error Window Baseline

Persistence

Last week value E(t-168) 1 168 24

Linear Regression

Simple Regression

E(t-168), d, h 1 168 24

Stateless LSTM Hidden 1: 50 Patience: 5 Optimizer: Adam Epochs: 100

168 553 168

Stateful LSTM Hidden 1: 100 Hidden 2: 80

Patience: 5 Optimizer: Adam Epochs: 60

3 67 3

Table 2 Parametric Summary of different models

(24)

23 Evaluation:

In our anomaly detection algorithm, we are trying to predict future value based on the past training data and anomaly detected is based on error threshold between predicted value and actual value.

The anomaly detection method adopted consists of three stages. In first stage, models are trained on dataset to learn the behavior of normal data. In second stage, trained models are used to predict values in datasets and obtain prediction errors. In final stage, these prediction errors are used to derive anomaly scores for observed data. The following flowchart describes the process for anomaly detection which will be explained further.

Figure 17 Anomaly Detection Process Flowchart

Error values generated from difference of predicted and actual

energy data

Development of reconstruction

error curve

No. 1:

Probability density function

No. 2:

Sum of Absolute Error Values

No. 3:

Threshold developed

Predicted Anomaly Visualization

Evaluation using metrics

(25)

24 Evaluation Metrics:

It is very important to define evaluation metrics which can be compared in literature. In this study, there are metrics defined for prediction and for anomaly detection. The standard used for prediction evaluation is MAE and MSE where error is defined as difference between true and predicted value.

However, for anomaly detection, several different metrics have been used. Precision & recall has been used in some cases. In other cases, specificity and sensitivity has been. In some cases, no metrics have been used due to low frequency of anomalies in dataset.

In this project, the performance of models to detect anomalies were evaluated by measure of accuracy, precision, recall, false positive rate (FPR) given by equations below:

Accuracy= (True Positive+ True Negative)/ (True Positive+ False Positive+ True Negative+ False Negative) Precision = True Positive / (True Positive + False Positive)

Recall = True Positive / (True Positive + False Negative) FPR = False Positive / (True Negative + False Positive)

Accuracy is evaluation metric which measures the ratio of detected anomalies over total number of values by the model. It is used to measure model performance in terms of flagging anomalies regardless whether it is true anomaly or false anomaly. It shows the ratio of correctly detected values (including both anomalies and non-anomalies) over all values which shows the correctness of the model.

Precision is evaluation metric which measures the ratio of correct anomalies detected from total detected anomalies by the model. It is used to measure model correctness in terms of correctly identified anomalies over total number of identified anomalies either correct or incorrect.

Recall is evaluation metric which defines the number of correct anomalies detected from total number of correctly detected values. It shows the detection ratio of true anomalies compared to true and non- anomalous values correctly detected by the model.

False positive rate (FPR) is error rate which defines the number of missed true anomalies from total number of incorrectly identified values including missed anomalies and incorrectly identified anomalies.

It shows the ratio of missed anomalies over missed values overall.

(26)

25 Anomaly Detection Models

After successful model training, the predicted energy values are compared with actual energy values after windowing the predicted and actual energy values using defined window size. This window size is used for error visualization and it is different from input windowed dataset used in LSTM as shown in model configuration table above. Post-processing involves error calculation between actual and predicted values in windowed dataset as shown below.

Figure 18 Reconstruction Error Window Visualization

The reconstruction error distribution is calculated for validation and testing dataset which is then used for development of anomaly detection algorithm. There are three different techniques which have been used for anomaly detection. Firstly, the probability of reconstruction error distribution function is used.

The probability of getting anomalous value is typically very low compared to normal values in a typical dataset. Using this principle, anomalous values of energy are identified from given energy dataset. The method involves calculation of probability density function for reconstruction errors which is then used to calculate sum of errors per window. The windows are ranked according to the sum of errors and the first 24 windows with lowest error values are identified as anomalies. These windowed datasets are then highlighted on the testing dataset as shown below which allows identification of range of values.

Figure 19 Probability Distribution Error window

Figure 20 Anomaly Detection Visualization-Probability Distribution

(27)

26

Secondly, the sum of absolute values for reconstruction error is used. The absolute value of anomalous value is typically very high compared to normal values in a typical dataset. Using this principle,

anomalous values of energy are identified from given energy dataset. The method involves calculation of absolute errors for reconstruction errors which is then used to calculate sum of errors per window.

The windows are ranked according to the sum of errors and the first 24 windows with highest absolute errors are identified as anomalies. The sum of absolute errors in each window is plotted on the figure shown below.

Figure 21 Absolute Error Window

Accordingly, the anomalous energy dataset windows are identified on the testing dataset using the absolute error summation.

Figure 22 Anomaly Detection Visualization – Absolute Error

Finally, anomaly detection method based on threshold calculation is developed. The anomalies are typically higher than the threshold value. Using this principle, anomalous values of energy are identified from given energy dataset. The method involves calculation of threshold using sum of mean and standard deviation of reconstruction error. It also involves calculation of maximum value from mean of reconstruction errors for each window. The max error values are compared against the threshold and the error values higher than threshold are identified as anomalies. The maximum of mean absolute errors in each window is plotted on the figure shown below.

(28)

27

Figure 23 Threshold Window

The error values higher than calculated threshold value are then identified as anomalous dataset in testing dataset as shown below.

Figure 24 Anomaly Detection Visualization - Threshold

These three methods have given different results which will be compared for performance and accuracy with real dataset containing labelled anomalies. In order to evaluate anomaly detection performance of these three models, the building dataset is considered for October 2018 containing weekend peak anomalies as shown in Fig. 19, 21 & 23. The table below compares anomaly detection performance for each method in terms of accuracy, precision, recall and FPR.

Anomaly Detection Algorithm Accuracy Precision Recall FPR

Probability Distribution 88% 6.3% 7.5% 6.8%

Mean Absolute Error 89% 16% 20% 6.1%

Threshold 84% 13% 32% 12%

Table 3 Anomaly Detection Performance

Amongst three anomaly detection models, the mean absolute error algorithm has demonstrated better performance. It has higher accuracy which means that it has identified both anomalies and non-

anomalies. Furthermore, with higher precision, the algorithm has identified more true anomalies than other anomaly detection models. Average recall shows that model was able to detect few anomalies as compared to non-anomalies. With low value of false positive, the anomaly detection model has shown better performance as it has achieved few false positive. Accordingly, the anomaly detection model based on mean absolute error has been selected in our thesis study and following result section is based on anomaly detection using mean absolute error values.

(29)

28

Algorithm Summary:

The method followed to achieve anomaly detection has been summarized below:

1. True anomalies are labelled in the dataset and visualization is performed to validate true anomalies through human eye inspected.

2. The prediction model is trained on the training (Tr) set to learn the normal pattern of the data.

The network parameters were tuned using Bayesian optimization followed by manual tuning with early stopping done on validation (V) data.

3. The trained model is used to find the prediction error e(t), where ‘e’ is the difference between prediction made at time t-1 and the actual value received at time t. Subsequently, prediction error vectors were obtained for training, validation, and testing data.

4. A set of errors are modeled as a normal distribution with mean and standard deviation. The reconstruction error distribution is used for three different approaches in anomaly detection.

5. Probability density function curve of reconstruction error is developed and values with minimum probability are considered as anomalies.

6. Absolute error from reconstruction error distribution curve is calculated and values with highest absolute error are considered as anomalies.

7. Threshold is calculated from reconstruction error distribution curve using mean and standard deviation of reconstruction error curve. The anomalies identified are one with values greater than threshold.

8. The anomalies detected through 5,6,7 is plotted on testing dataset for error visualization.

9. Evaluation of metrics is performed on testing dataset by comparing true and identified anomalies.

(30)

29

LSTM implementation in Keras:

The LSTM neural network model is implemented in this thesis using Keras which is python based deep learning library. Keras is high level API of TensorFlow which is used for development of machine learning algorithms particularly deep neural networks. Keras is good library for implementation of LSTM but it has low flexibility in terms of neural network configuration and data input structure. This created an impact on experimentation with Keras LSTMs.

There are two types of LSTM configuration in Keras – stateless and stateful. In stateless mode, the LSTM is unable to learn long-term dependencies due to resetting of the cell state between each batch where batch is portion of input training dataset. The cell state is only maintained for values equal to number of lookback values which is number of input energy values in each batch. To avoid resetting cell state at the start of new batch, the stateful mode is used. This allows the model to maintain cell state throughout entire run of training dataset which is split into several batches. However, in Keras, it is required to have fixed batch size which equally divides the training dataset in equal number of batches. This limits the batch size values and requires correct dataset structuring beforehand. This Keras implementation is required due to the way stateful LSTM works in Keras. After each batch sample, the cell propagates information from the previous batch samples to next batch in the form of an array which is equal to number of LSTM cells. Since cell propagates info at timestep t from batch 1 to exact timestep t in batch 2, each batch must be equally dimensioned for correct relaying of information at each timestep

between batches. This causes Keras to require equally sized batches.

However, this limitation is issue in online configurations where training batch size are way larger, and we must predict one value (batch size=1). In our case, data was already available offline, so we were able to maintain same batch size across training and testing dataset.

(31)

30

Results:

In this section, we are going to discuss the results gathered from the baseline, linear regression, and LSTM method. Accordingly, we will first compare prediction performance for four models using metrics such as MAE, MSE, R-squared against standard values in literature. Then, we will compare anomaly detection performance for four models using metrics such as accuracy, precision, recall, FPR and compared against standard values in literature. Finally, we will compare performance of LSTM and linear regression against baseline persistence under different building categories to identify most effective anomaly detection technique for our application.

Model Prediction Performance:

In this section, the prediction performance of four models have been discussed on sample test dataset which is October 2018 for a building. The figures below show the overlapping of normalized energy values on y-axis against time for actual and prediction curve for each respective model in October 2018 detecting weekend anomalies.

Baseline Persistence:

Figure 25 Baseline Persistence - Predict vs. Actual

Linear Regression:

Figure 26 Linear Regression - Predict vs. Actual

(32)

31 Stateless LSTM:

Figure 27 Stateless LSTM - Predict vs. Actual

Stateful LSTM:

Figure 28 Stateful LSTM - Predict vs. Actual

The prediction results are overlapped over actual energy values and prediction error metrics have been developed as shown in table below.

MAE/kWh MSE/kWh2 RMSE/kWh R-Squared

Baseline Persistence

0.219 0.088 0.296 0.924

Linear Regression

0.194 0.066 0.258 0.941

Stateless LSTM 0.188 0.073 0.271 0.934

Stateful LSTM 0.283 0.145 0.382 0.871

Literature (CNN LSTM)

0.349 0.374 0.611 -

Table 4 Prediction Error Comparison - Different Models

Mean absolute error (MAE) is based on the average of absolute error values and it measures the differences without considering the direction of error. It is only focused on the magnitude of the error which allows detection of contextual anomalies. Accordingly, MAE is good predictive performance indicator for continuous variables such as energy demand.

Mean squared error (MSE) is based on the squared error values which removes the direction of error and the larger values are weighted heavily. Root mean squared error is squared root of the error which can be used to remove bias in error detection with lower weightage as compared to MSE.

(33)

32

R-squared (R2) is based on variance of error values and higher R2 values can relate to higher correlation between input and output values of the model. So higher R2 means better prediction ability of the model.

In terms of error metrics for prediction, the linear regression model has shown lowest mean squared error and better R2 value. These values mean that linear regression model has better predictive

performance. However, in terms of MAE, stateless LSTM has performed better than other models which means stateless LSTM is able to predict next step with higher accuracy but this performance is

comparable to linear regression which makes linear regression most effective in terms of prediction.

Anomaly Detection Performance:

The anomaly detection of the model is compared in terms of accuracy, precision, recall, false positive rate (FPR) for the various types of anomalies and compared to literature.

Zero energy values:

This is case of September 2015 for a building with extensive zero energy values. True anomaly algorithm has been able to identify zero energy values well as shown above. Accordingly, true anomalies contain only desired zero energy values which improves reliability of evaluation. The results for baseline persistence, linear regression, stateless and stateful LSTM are shown below.

Accuracy Precision Recall FPR

Baseline Persistence 54% 47% 7.2% 6.5%

Linear Regression 57% 73% 11% 3.4%

Stateless LSTM 61% 100% 15% 0%

Stateful LSTM 87% 10% 10% 6.6%

Table 5 Anomaly Detection Performance - Zero Energy Anomalies

Stateful LSTM has demonstrated higher accuracy which shows that it has correctly detected values but with lower value of precision, most of correctly detected values were non-anomalies. Overall, Stateless LSTM was able to detect all anomalies although it was not able to detect non-anomalies as much as stateful LSTM. Furthermore, it has better recall which shows that model was able to detect more anomalies than non-anomalies. Finally, it has minimum FPR which shows that stateless model did not detect false anomalies. (false positive). It must be noted that anomaly detection model was unable to detect all zero energy values because the design of reconstruction error curve limits the detection of anomalies to 24 datapoints which leads to limit on number of anomalies identified.

(34)

33 True Anomalies:

Figure 29 Zero Energy Anomalies Visualization-True Anomalies

Baseline Model:

Figure 30 Zero Energy Anomalies Visualization-Baseline Model

Linear Regression Model:

Figure 31 Zero Energy Anomalies Visualization-Linear Regression

Stateless LSTM Model:

Figure 32 Zero Energy Anomalies Visualization-Stateless LSTM

(35)

34 Stateful LSTM Model:

Figure 33 Zero Energy Anomalies Visualization-Stateful LSTM

High Peak Values:

This is case of September 2019 for another building with very high peak values. The results for baseline persistence, linear regression, stateful and stateless LSTM are shown below.

Baseline Persistence 94% 14% 70% 5.9%

Stateless LSTM 92% 0% 0% 6.8%

Stateful LSTM 92% 6.3% 30% 6.5%

Table 6 Anomaly Detection Performance - High Peak Anomalies

Baseline Persistence has demonstrated higher accuracy than other methods which means that it has correctly detected the values in model. Stateful LSTM has detected the values but low precision shows that most of the values detected correctly were not anomalies. Baseline model has higher recall which means it has detected more anomalies than non-anomalies. Linear Regression has lower false positive rate which means it has detected few incorrect anomalies. However, false positive rate can mislead here because of low number of error values leading to low number of false positives. As there are lot of true negative values as shown below, FPR value for linear regression are lower than usual. It should be noted that high peak anomalies are based on single window which defines the breadth of the orange label.

Accordingly, the labels of true anomalies are wider than single timestep.

True Anomalies:

Figure 34 High Peak Anomalies Visualization-True Windows

(36)

35 Baseline Model:

Figure 35 High Peak Anomalies Visualization-Baseline Persistence

Linear Regression Model:

Figure 36 High Peak Anomalies Visualization-Linear Regression

Figure 37 High Peak Anomalies Visualization-Stateless LSTM

Stateful LSTM Model:

Figure 38 High Peak Anomalies Visualization-Stateful LSTM

(37)

36 Baseload Shift:

This is the case of January 2016 for another building which has sudden base load shift. The results for baseline persistence, linear regression, stateful and stateless LSTM are shown below.

Stateless LSTM 54% 0% 0% 11%

Stateful LSTM 71% 100% 19% 0%

Table 7 Anomaly Detection Performance-Baseload Shift Anomalies

The stateful LSTM has achieved higher accuracy which means that it has correctly detected anomalies and non-anomalies. Furthermore, higher precision shows that all of anomalies detected were true anomalies with zero false positive. With higher value of recall, stateful LSTM has identified more anomalies than normal values. As precision is high for stateful LSTM, false positives were very rare and hence FPR is very low.

True Anomalies:

Figure 39 Baseload Shift Anomalies-True Anomalies

Baseline Model:

Figure 40 Baseload Shift Anomalies-Baseline Persistence

(38)

37 Linear Regression Model:

Figure 41 Baseload Shift Anomalies-Linear Regression

Stateless Model:

Figure 42 Baseload Shift Anomalies-Stateless LSTM

Stateful Model:

Figure 43 Baseload Shift Anomalies-Stateful LSTM

(39)

38 Weekday gaps:

This is case of December 2019 for a building with Christmas break during weekdays which caused low energy values during highlighted period. The results for baseline persistence, linear regression, stateful and stateless LSTM are shown below.

Stateless LSTM 85% 0% 0% 7.4%

Stateful LSTM 85% 0% 0% 7.3%

Literature LSTM [8] - 60.9% 67.9% -

Table 8 Anomaly Detection Performance-Weekday Anomalies

The baseline persistence has achieved higher accuracy which shows that model has performed well in overall detection. With high value in precision, the model has been able to detect correct anomalies from the identified anomalies. Furthermore, higher value of recall shows that most of detected values were anomalies compared to non-anomalies. Finally, false positive rate is lower which shows that model has correctly identified anomalies with low error. In the literature, the precision achieved was around 60% using stateful LSTM which demonstrates that weekday gaps can be detected. However, the precision achieved using baseline persistence is higher than stateful LSTM method found by[8].

True Anomalies:

Figure 44 Weekday Gap Anomalies-True Anomalies

Baseline Model:

Figure 45 Weekday Gap Anomalies-Baseline Persistence

(40)

Figure 46 Weekday Gap Anomalies-Linear Regression

Stateless Model:

Figure 47 Weekday Gap Anomalies-Stateless LSTM

Stateful Model:

Figure 48 Weekday Gap Anomalies-Stateful LSTM

Literature:

Figure 49 Literature Anomaly Detection-Stateful LSTM

(41)

40 Weekend Peaks:

This is the case of October 2018 in another building which has weekend peaks. The results for baseline persistence, linear regression, stateful and stateless LSTM are shown below.

Stateless LSTM 87% 0% 0% 0.1%

Stateful LSTM 87% 4.2% 4.3% 7%

Table 9 Anomaly Detection Performance-Weekend Anomalies

Linear regression model has demonstrated much better performance with higher accuracy. This shows that model has overall better detection rate. With the current value of precision, there is 50% chance that true anomalies are correctly detected from the identified anomalies. Linear regression has better recall value which suggests that model has detected anomalies more than non-anomalous points.

Finally, stateless LSTM has minimum FPR value which shows that model has identified very few incorrect anomalies as compared to other models.

True Anomalies:

Figure 50 Weekend Peak Anomalies-True Anomalies

Baseline Model:

Figure 51 Weekend Peak Anomalies-Baseline Persistence

(42)

Figure 52 Weekend Peak Anomalies-Linear Regression

Figure 53 Weekend Peak Anomalies-Stateless LSTM

Stateful LSTM Model:

Figure 54 Weekend Peak Anomalies-Stateful LSTM

The table below summarize the effectiveness of all models discussed above in anomaly prediction.

Model Anomaly Categories

Baseline Persistence Peak values, Weekday gaps, Weekend peaks, Zero values, Baseload shift Linear Regression Peak values, Weekday gaps, Weekend peaks, Zero values, Baseload shift Stateless LSTM Zero values, Baseload shift

Stateful LSTM Zero values, Baseload shift

Table 10 Model Anomaly Detection Performance

According to the results above, different models have performed better on different types of anomalies.

Furthermore, baseline persistence and linear regression have shown better performance on all types of anomalies while LSTM was able to identify only two types of anomalies. Stateful LSTM was unable to detect weekend peaks and peak values due to precision less than 10% which can be neglected.

(43)

42

Anomaly Detection in different sectors:

Figure 55 Building Energy Profiles-Different Sectors

As shown in the figures above, different buildings have different energy profile. The first two images typically represent office and residential buildings which operate on weekdays. The last image represents industrial buildings or commercial complex. Since different sectors have different energy profiles, it is crucial to evaluate anomaly detection performance using different metrics for different sectors. In this domain, stateless LSTM configuration has been changed to keep lookback period same as stateful LSTM configuration to keep window size constant which is required for anomaly detection on entire dataset in this case.

Network Architecture

Parameters Lookback Batch Size

Baseline Persistence

Last week value E(t-168) 1 168

Linear Regression Simple Regression E(t-168), d, h 1 168 Stateless LSTM Hidden 1: 50 Patience: 5

Optimizer: Adam Epochs: 100

3 670

Stateful LSTM Hidden 1: 100 Hidden 2: 80

Patience: 5 Optimizer: Adam Epochs: 60

3 67

Table 11 Parametric Configuration-Different Models

(44)

43

The data is considered for December 2018 for various office buildings using different metrics.

Building ID Category Metric Baseline % Regression % Stateless LSTM %

Stateful LSTM %

94495 Office Accuracy 80 80.6 74.6 71.1

Precision 81.3 85.4 20.8 16.6

Recall 22.9 24.1 6.6 4.70

FPR 1.69 1.32 6.9 7.54

7158 Office Accuracy 96 45.8 82.3 82.6

Precision 100 47.9 4.16 2.08

Recall 63.2 8.33 2.5 1.31

FPR 0 8.33 7.4 7.53

96153 Office Accuracy 94.6 94.6 82 82.9

Precision 89.6 89.6 2.08 4.16

Recall 56.6 56.6 1.25 2.63

FPR 0.8 0.8 7.58 7.37

Table 12 Anomaly Detection Performance-Office

In office buildings, the baseline persistence model achieved better score as compared to the other models. The average accuracy around 90.2% as it has correctly detected anomalies and non-anomalies.

Furthermore, the average precision around 90.3% which means that most of the identified anomalies were actual anomalies. With lower value of recall around 47.6%, baseline model has kept balance between detecting anomalies and non-anomalies. The low value of false positive rate around 0.83%

means that model has correctly detected anomalies with very low error which leads to low amount of false positive.

Machine Learning to identify aberrant energy use to detect property failures

Machine Learning to identify aberrant energy use to detect property failures

SHAHROZ HABIB

Abstract:

Sammanfattning:

Acknowledgements:

Contents

Introduction:

Research Question:

Scope of the problem:

Thesis outline:

Background:

Related Work:

Related Theory:

Step by Step walk through LSTM:

f

= σ (W

. [h

,x

] + b

)

(1)

i

= σ (W

.[h

,x

] + b

) (2)

Ĉ

= tanh(W

.[h

,x

] + b

) (3)

C

= f

* C

+ i

* Ĉ

(4)

O

= σ(W

.[h

,x

] + b

) (5)

h

= O

*tanh(C

) (6)

Methodology:

Data pre-processing:

Mestro Building Energy Consumption dataset:

Anomaly Detection Process:

True Anomaly Definition

Model Training

Evaluation

Anomaly Window Brief:

E(t)

= E(t-168)

(8)

Y= b

x

+ b

x

+ b

x

(9)

Algorithm Summary:

LSTM implementation in Keras:

Results:

Model Prediction Performance:

Anomaly Detection Performance:

Anomaly Detection in different sectors: