A Comparative Study of Machine Learning Algorithms for Short-Term Building Cooling Load Predictions

(1)

A Comparative Study of Machine

Learning Algorithms for Short-Term

Building Cooling Load Predictions

David Thinsz

thinsz@kth.se

June 10, 2019

Bachelor of Science Thesis

KTH School of Industrial Engineering and Management Energy Technology EGI-2019

(2)

Bachelor of Science Thesis EGI-2019 TRITA-ITM-EX 2019:332 A Comparative Study of Machine Learning Algorithms for Short-Term

Building Cooling Load Predictions David Thinsz Approved: 2019-06-10 Examiner: Andrew Martin Supervisor: Yang Zhao Andrew Martin

Commissioner: Contact person:

(3)

Abstract

Buildings account for a large part of the total energy demand in the world. The building energy demand increases every year and space cooling is the main contributor for this increase. In order to create a sustainable global energy system it is therefore of great importance to improve the energy eﬃciency in buildings. Cooling load predic-tions are an essential part of improving the building energy eﬃciency. The widespread use of Building Automation Systems (BAS) in mod-ern buildings makes it possible to use data-driven methods for such predictions.

The purpose of this study is twofold: to compare the performance of five different machine learning algorithms by analyzing their accuracy and robustness; and to examine what effect different versions of a data set have on these algorithms. The data that is used in this study is one-year operational data from a building in the city of Shenzhen in China. This data set is engineered in multiple different ways and is used to test the algorithms.

(4)

Sammanfattning

Byggnader står för en stor del av världens totala energibehov. Byg-gnaders energibehov ökar varje år och den största bidragande faktorn är det ökade behovet av energi för nedkylning av byggnader. För att skapa ett hållbart globalt energisystem är det viktigt att förbättra energianvändningen i byggnader. Att uppskatta köldbelastningen i byggnader är en nödvändig del för att i nästa steg kunna förbättra energieﬀektiviteten i byggnader. Den breda användningen av automa-tionssystem i moderna byggnader (BAS) gör det möjligt att använda data-baserade metoder för sådana uppskattningar.

Syftet med den här studien är dels att jämföra hur fem olika maskinin-lärningsalgoritmer fungerar genom att analysera hur noggranna och robusta de är, dels att undersöka vilken eﬀekt olika versioner av ett dataset har på dessa algoritmer. Datan som används i denna studie är ett års operativ data från en byggnad i Shenzhen i Kina. Detta dataset modifieras på ett antal olika sätt och används för att pröva algoritmerna.

(5)

(6)

Preface

This bachelor thesis is in the field of Sustainable Energy Engineering and is written as part of a five-year program in Mechanical Engineering at the Royal Institute of Technology (KTH) in Sweden. The project is part of a collaboration between KTH and Zhejiang University (ZJU) in China. The research was conducted during a two-month period from March to May 2019 in the city of Hangzhou in China.

I would like to thank my supervisors Yang Zhao at ZJU and Andrew Martin at KTH for arranging this project and for guiding me through it. I would also like to thank Chaobo Zhang and Omar Elfarouk Bourahla for great discussions about machine learning that have taught me a lot and given me valuable insights while preforming this research.

I hope you will find this thesis interesting. David Thinsz

(7)

List of Figures

1 The research outline . . . 4

2 Visualization of a 5-fold cross-validation . . . 10

3 An overview plot of the raw data . . . 12

4 The correlation between time-lagged cooling loads and the actual cooling load . . . 13

5 The correlation between time-lagged temperatures and the actual cooling load . . . 14

6 Visualisation of walk-forward prediction . . . 17

7 Prediction results using the top performing models from a two week period . . . 19

8 Computation time for building the forward-walk models . . . 20

9 Computation time for predicting using the forward-walk mod-els . . . 21

List of Tables

1 A simple example explaining time-lag variables . . . 5

2 Summary of the raw data . . . 11

3 Descriptions of the datasets . . . 15

4 Optimized hyperparameters for each dataset . . . 16

5 Performances of the prediction models . . . 17

(9)

1 Introduction

The global energy demand increases every year. Many different factors play a role in causing this increase, but the two main driving forces are economic growth and rise in standard of living. In 2018 the global energy demand increased with 2.3 percent, which according to International Energy Agency (IEA) is faster than any other year in the last decade. About 20 percent of the increase was caused by more extreme weather than in previous years, causing increased heating and cooling loads. The worldwide energy efficiency improves yearly but the rate of which it improves have been slowing down over the last couple of years, resulting in a decrease in energy intensity (units of energy per unit of GDP) [1]. Improving the energy efficiency is essential in order to create a sustainable global energy system.

The building sector and the building construction sector together accounts for about 36 percent of the worlds energy demand. The building energy demand is currently increasing with about 3 percent per year [2]. Heat-ing, ventilation and air-conditioning (HVAC) systems are responsible for the largest proportion of the building energy demand and space cooling is cur-rently the main reason for the increase in building energy demand[2,3]. The eﬃciency of HVAC systems can be improved by optimizing how the system operates. This usually requires accurate load predictions, such as predictions of the building’s cooling load. Cooling load is defined as the amount of heat that needs to be removed from a space in order to keep temperature and humidity constant. Predictions of the cooling load has also found to be very useful in other areas of building energy field, such as fault detection and demand-side management for smart grids[4].

Prediction models are usually divided into three diﬀerent categories: white-box, grey-box and black-box. White-box models are theoretical and combine physical principles with building properties. The main advantage of these models is their interpretability. These models clearly show how much each variable contributes to the prediction [5]. Black-box models are completely data-driven. They are often very eﬃcient, but they are on the contrary to white-box models completely lack in interpretability. Gray-box models is a middle ground between white- and black-box models and combines theoret-ical principles with data.

(10)

operational building data, calendar variables and meteorological variables. One important variable for load predictions is the building occupancy. How-ever, that data is seldom available in reality. The use of calendar features such as day of week and time of day is often enough since the building oc-cupancy depend on these variables.

The field of machine learning oﬀers eﬀective methods for data-driven building cooling load predictions. Machine learning is a subfield of computer science and studies how computers can learn without explicitly being programmed with a certain set of rules. Two main fields within machine learning are supervised learning and unsupervised learning. In supervised learning, pre-dictive models are built using labeled data. It includes classification tasks, for example detecting if an email is a spam, and regression tasks such as cooling load predictions. Unsupervised learning on the other hand uses un-labeled data. It is commonly used for outlier detection using clustering. The goal of developing predictive models is to create models that are accu-rately able to predict on new data that was not available when the model was built. A model that is trained on a certain data set cannot be evaluated on the same data set since it will have "memory" of it. The whole data set is therefore often divided into two parts. One part is used for building or training the model, the training set, and the other part of the data is used for evaluating the model, the test set. In that way, it is possible to estimate how well the model generalizes, how well the model will perform on previously unseen data. When developing predictive models, it is important to make sure that no information about the test set is leaked into the training set. Predictive models that are built with a training set that contains information about the test set would give overly optimistic results.

Every machine learning algorithm has its own qualities and the algorithms performance depends greatly on the type of data that is being used for the predictive model. Some algorithms are good at capturing non-linear rela-tionships in data while others perform better and more eﬃciently on data with linear relationships. Existing studies on data-driven cooling load pre-dictions have shown that nonlinear predictive models perform better than linear models[6]. The dimensionality of the dataset and the risk of overfit-ting are also important factors to take into consideration when selecoverfit-ting an algorithm for a specific problem.

1.1 Problem Statement

(11)

building in Shenzhen, China. The performance of the algorithms is evaluated based on how accurate and robust their predictive models are. The eﬀect that diﬀerent versions of the data set have on these algorithms is also analyzed systematically as well the computational load of the algorithms.

1.2 Report Overview

This paper is structured in five sections. Following the introduction, the methods that are used in this study are explained and their qualities and shortcomings are discussed in section 2. These methods are then applied in section 3 to operational data from a building in Shenzhen, China. Case specific methods and parameter choices from the model development process are also presented in this section. Results from the study are then presented and discussed in section 4. The conclusions of this study are presented in section 5.

2 Methodology

2.1 The outline of the research

(12)

Raw data

Creation of feature sets to be used as model inputs

Development of predictive models using diﬀerent machine

learning algorithms

Evaluation and comparision

Figure 1: The research outline

Predictive models are then built and tested on a part of the data set and their accuracy are compared. The best performing model for each of the algo-rithms is then selected for further analysis using the method of forward-walk prediction with the purpose to determine the robustness of the algorithms. The computational loads of the algorithms are also compared in this part of the study.

The steps in Figure 1 are described in a theoretical manner in the following parts of section 2. Section 2.2 and 2.3 explain the methods used to create the feature sets to be used as model inputs. Section 2.4 and 2.5 explain the methods used for developing the predictive models. Section 2.6 describes the metrics that are used for evaluating the models. These methods are then applied in a case study in section 3.

2.2 Feature engineering

(13)

2.2.1 Creation of time-lag variables

If it is shown that there is a correlation between the output variable at time tand an input variable at a previous time-step t − k, then time-lag variables can be introduced to make predictions more accurate. A time-lag feature is created by making a copy of the variable that is to be lagged and shifting it ktime-steps.

This process is illustrated with a fictive example in Table 1. The table shows a small feature set with three variables: Cooling load, Outdoor temperature and Lagged outdoor temperature. The feature Lagged outdoor temperature is a copy of the feature Outdoor temperature and is shifted two time-steps. Pre-dictive models that are built using this feature set could then use the features Outdoor temperature and Lagged outdoor temperature to predict Cooling load at time t.

Table 1: A simple example explaining time-lag variables Observation Cooling load (kW) Outdoor temperature (◦_{C) Lagged outdoor temperature (}◦_C)

1 1209 17.08

-2 1179 16.63

-3 1141 16.67 17.08

4 1144 16.46 16.63

5 1133 16.68 16.67

One consequence of this method is that if a feature is lagged k time-steps it causes the first k samples in the feature set to be unusable, see observation 1 and observation 2 in the example. These values are unknown since no mea-surements have been collected for the lagged feature for those observations. One method to determine which lag features to be introduced into a feature set is the filtering method. Many time-lag features are initially created sys-tematically. The filtering method is then used to determine which features to keep and which to remove from the feature set using the Pearson correlation coeﬃcient. The Pearson correlation between two features with a sample size of n is defined as rxy = n 󰁓 i=1 (xi− ¯x)(yi− ¯y) 󰁶 n 󰁓 i=1 (xi− ¯x)2 󰁶 n 󰁓 i=1 (yi− ¯y)2 (1)

(14)

redundant and can therefore be removed. It is recommended to remove the one of two features that have the lowest correlation with the target feature. The correlation thresholds for removal depend on the specific problem and dataset. Removing features that have a lower correlation than 0.6 with the target feature and a threshold of 0.9 for redundancy is suggested in the literature[7].

2.2.2 Change of representation of variables

Some variables such as calendar variables are often cyclically repeated through out a data set. If a data set contains a feature month with values ranging from 1 to 12. The Euclidian distance between two consecutive values are then always 1 except for the distance between 12 and 1 which is 11. Some algorithms such as k-nearest neighbors that are based on distance metrics can get confused by this. The optimal method of solving this problem which often used in machine learning classification problems is by representing the feature using one-hot encoding, that is by representing each value of the fea-ture with its own feafea-ture. The feafea-ture month would therefore be represented with 12 features all containing zeros except for one. The month January could be represented as [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], February could be [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] and so on. The Euclidian distance between each of the values would then be√2. Using the one-hot representation instead of a decimal representation for many variables would increase the dimension-ality of the data set. This would in turn increase the computational load of the algorithms by a lot. Another method is to represent the decimal values as a binary number with each bit being represented as its own feature. The largest value in the in the feature month is 12 which is 1100 in binary and would be represented using four features as [1, 1, 0, 0]. The distance between two consecutive values are then either 1,√2,√3or √4. This representation causes the Euclidian distance between every two consecutive values to be quite similar and does not increase the dimensionality by as much as the one-hot encoding would.

2.2.3 Removal of features

(15)

2.3 Data transformation

Some machine learning algorithms such as support vector machines are sen-sitive to scale diﬀerences in the data. These scale diﬀerences cause large scale features to have more impact on the model than small scale features. This problem is overcome by scaling each feature in the dataset independently prior to teaching the algorithm. Two common methods for scaling the data are normalization and standardization. Normalization scales the data on the interval [0, 1] by

xi,scaled=

xi− xmin xmax− xmin

, (2)

where xmax is the largest value of a feature, xmin is the smallest value of a feature, xi is the value of a specific feature in a sample. Standardization is the process of rescaling the data to have a mean µ of zero and a variance σ of one

xi,scaled =

xi− µ

σ (3)

A major drawback with normalization is that if the dataset contains outliers then it causes the other data points to be scaled into a very small range. Standardization will therefore be used throughout in this study.

It was mentioned in the introduction that it is important for the evaluation process of predictive models that there is no leakage of information from the test set to the training set. The importance of this cannot be stressed enough and it is important to keep in mind when scaling data. Considering that a certain feature set will be used for developing a predictive model. If this data set first were to be standardized and then divided up into a training set and a test set information about the test set would be leaked into the training set. Instead, the feature set is first divided up into a training set and test set. The scaling is then performed only on the training set making it standardized. The same transformation (same µ and σ) as for the training set is then applied to the test-set making it approximately standardized.

2.4 Prediction algorithms

(16)

2.4.1 Support vector regression

Support vector regression (SVR) is a nonlinear regression algorithm that is based on support vector machines (SVM). The goal of the algorithm is to construct a hyperplane that minimize the error using the maximum margin principle, having an as large distance to the nearest data points as possible. An epsilon-tube is used in the algorithm to define a boundary for tolerated errors. The algorithm uses a kernel to transform input data into a higher-dimensional feature space where the hyperplane is created and where the error is minimized. The radial basis function (RBF) kernel is used in this study and the size of the epsilon tube is set to 0.1 for all models. Two hyperparameters C and γ will be optimized for the models developed using SVR in this study. Too large values of C causes the model to overfit on the training data and too small values will result in poor fitting on the training data. The parameter γ defines the width of the RBF kernel[8, 10].

2.4.2 K-nearest neighbors

K-nearest neighbors (KNN) is a very simple machine learning algorithm. It first computes the Euclidian distance from the data point that is going to be predicted to all the points in the training set. It then chooses the k nearest data points. The average of these data points is then the predicted value. The value of k will be optimized[11].

2.4.3 Random forest

(17)

possible are usually preferred for maximum accuracy but more trees cause increased computational load. The feature max_depth of the tree will be optimized in this study[12].

2.4.4 Multilayer perceptron

Multilayer perceptron (MLP) is a type of artificial neural network. That can be used to create nonlinear predictive models. The network consists of an input layer, at least one hidden layer and then an output layer. For each neuron in the hidden layers a weighted linear sum is calculated, and a nonlinear activation function is then used to calculate the output of the neuron. The MLP algorithm only calculates the data in one direction making it a so-called feed-forward neural network.

2.4.5 Theil-Sen esimator

Theil-Sen estimator (TSE) is a linear regression model that is known for being robust and performs well on multivariate datasets[13]. The algorithm has no parameters that needs to be tuned before building predictive models.

2.5 Hyperparameter optimization

Poorly tuned hyperparameters will result in bad predictive accuracy and can cause overfitting. The selection of these parameters can be made using many diﬀerent methods. One naïve method is parameter gridsearch. When preforming a parameter gridsearch, intervals of the parameters are first spec-ified, and the algorithm is then trained on the training set and with each combination of these parameters. All models are then evaluated and com-pared. The metric that will be used for evaluating the models is R-squared (R2_{) as defined by} R2_{= 1}₋ n 󰁓 i=1 (yi− ˆyi)2 n 󰁓 i=1 (yi− ¯yi)2 . (4)

(18)

it is the accuracy is then tested on the first fold. In the second test, all data except the second fold is used for training the model and it is then evaluated on the second fold and so on. A 5-fold cross-validation is visualized in Figure 2. Test 1: Test 2: Test 3: Test 4: Test 5:

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

training set

test train train train train

train test train train train

train train test train train

train train train test train

train train train train test

Figure 2: Visualization of a 5-fold cross-validation

When preforming the k-fold cross-validation parameter gridsearch these k tests are performed for each combination of the hyperparameters. The aver-age R2 _{score for a certain combination of the hyperparameters is calculated} for all k-fold cross-validation tests. The combination of hyperparameters that gives the highest average score is then chosen as the parameter to be used for building the model.

2.6 Performance evaluation metrics

In order to evaluate the performance of the models three diﬀerent evaluation metrics will be considered. Two of the metrics, the Mean Absolute Error (MAE) MAE = n 󰁓 i=1|y i− ˆyi| n (5)

and the Root Mean Square Error (RMSE)

(19)

are scale-dependent. These metrics are good for comparing models that have been built using the same dataset. MAE and RMSE are both good since their values are easy to interpret. RMSE is more sensitive to outliers

than MAE [14]. For comparing the performance of the models developed

in this study with models from other studies a scale-independent metric is introduced, the Coeﬃcient of Variation of the Root Mean Square Error (CV-RMSE) CV-RMSE = 󰁶 n 󰁓 i=1 (yi−ˆyi)2 n n 󰁓 i=1 yi n . (7)

The American Society of Heating, Refrigerating and Air-Conditioning En-gineers (ASHRAE) recommend this metric to be used for performance eval-uation of prediction models for energy demand. Kadir Amasyali and Nora M. El-Gohary did a comparative study of existing data-driven building en-ergy demand prediction studies and found out that CV-RMSE was the most common metric and was used in 41 percent of the studies in their review[9].

3 A case study of predictive modeling

3.1 The raw data

Operational data from a building in Shenzhen, China will be used in this study. The building mainly consists of conference rooms, government oﬃces, museums and dining rooms. The data was collected during one year with a sampling interval of ten minutes. The data set consists of six calendar features (year, month, day, hour, minute and day of week), two weather fea-tures (outdoor temperature and outdoor relative humidity) and the cooling load. The total number of samples in the data set is 52 010. A summary of the raw data is shown in Table 2.

Table 2: Summary of the raw data

Feature Minimum Mean Median Maximum

Cooling load (kW) 0.0 3414.8 3327.0 11648.0 Outdoor temperature (◦_C) _3.6 _23.16 _24.51 _34.49 Relative humidity (%) 14.15 53.22 54.25 100.0

(20)

0 10000 20000 30000 40000 50000

Data sample number

0 5000 10000 C oo lin g lo ad (k W ) 0 10000 20000 30000 40000 50000

Data sample number

10 20 30 O ut do or te m pe ra tu re 0 10000 20000 30000 40000 50000

Data sample number

50 100 R el at iv e hu m id ity

Figure 3: An overview plot of the raw data

3.2 Creation of feature sets for model input

(21)

3.2.1 Time-lag features

Figure 3 show a very strong correlation between the cooling load and the outdoor temperature. These two features will be considered for lagging and the maximum lag is chosen 24 hours in this study. The features that are lagged cooling load and outdoor temperature will be referred to as COL-X and TEM-X respectively, where X is the number of hours that the feature is shifted. After creation of the time-lag features the correlation between each of the time-lag features and the cooling load is calculated. Figure 4 show how the correlation between the time-lagged cooling load features and the actual cooling load. Figure 5 show the correlation between each of the time-lagged temperatures and the cooling load.

CO L-1 CO L-2 CO L-3 CO L-4 CO L-5 CO L-6 CO L-7 CO L-8 CO L-9 CO L-10 CO L-11 CO L-12 CO L-13 CO L-14 CO L-15 CO L-16 CO L-17 CO L-18 CO L-19 CO L-20 CO L-21 CO L-22 CO L-23 CO L-24 Time-lagged feature 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Pe ar so n C or re la tio n C oe ﬃ ci en t

(22)

TEM -1 TEM -2 TEM -3 TEM -4 TEM -5 TEM -6 TEM -7 TEM -8 TEM -9 TEM -10 TEM -11 TEM -12 TEM -13 TEM -14 TEM -15 TEM -16 TEM -17 TEM -18 TEM -19 TEM -20 TEM -21 TEM -22 TEM -23 TEM -24 Time-lagged feature 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 Pe ar so n C or re la tio n C oe ﬃ ci en t

Figure 5: The correlation between time-lagged temperatures and the actual cooling load

The filtering method as proposed in section 2.2 is then used to remove time-lag features with a weak correlation with the cooling load and to remove features that are redundant. The final set of time-lag features that will be used are: COL-1, COL-4, COL-7, COL-18, COL-21, COL-24 and TEM-23. 3.2.2 Representation of calendar features

The feature sets that will be used for developing the models in this study will either contain the decimal or the binary representation of the calendar features using the method presented in section 2.2.2.

3.2.3 Removal of calendar features

The last variation in the feature sets will be removal of calendar features. The feature sets will either contain all the original calendar features or have one of the following combinations of features removed:

• month • month, day

• month, day, day of week

(23)

3.2.4 Resulting feature sets

By using all combinations of the methods explained above 18 diﬀerent feature sets are created. These feature sets are described in Table 3.

Table 3: Descriptions of the datasets

Feature set Timelag Time format Feature removal

FS1 no decimal none

FS2 no binary none

FS3 no decimal month

FS4 no binary month

FS5 no decimal month, day

FS6 no binary month, day

FS7 no decimal month, day, day of week

FS8 no binary month, day, day of week

FS9 no - month, day, day of week, hour

FS10 yes decimal none

FS11 yes binary none

FS12 yes decimal month

FS13 yes binary month

FS14 yes decimal month, day

FS15 yes binary month, day

FS16 yes decimal month, day, day of week

FS17 yes binary month, day, day of week

FS18 yes - month, day, day of week, hour

3.3 Development of predictive models

Two tests will be used to evaluate and compare the machine learning algo-rithms. The first test is used to: get an indication of the performance of the algorithms, to determine performance diﬀerences between the algorithms us-ing diﬀerent data sets and to optimize hyperparameters. The second test will then be used to investigate the robustness of the algorithms and to analyze the computational load.

3.3.1 Test 1

The first test will be performed by training each algorithm on data from 60 consecutive days and then testing the models on data from the following 30 days. The specific part of the data is chosen arbitrarily, and the data being used in this study uses samples from the first of May.

(24)

literature and partly in a course-to-fine manner. The two parameters that

C and γ are optimized for the SVR models. The values that are tested for

C and γ are powers of two, 2x, where x ∈ {−5, −4, −3, ..., 15, 16} for C and x_{∈ {−15, −14, −13, ..., 3, 4} for γ. Values ranging from 1 to 50 will be used} as candidate values for the number of neighbors that are used in the KNN models and the same values are used for determining the optimized maximum depth for the random forest models. The parameters that are optimized for the multilayer perceptron are the activation function, the number of hidden layers and the number of perceptrons in each of the layers. The activation functions can either be rectified linear units (ReLU), logistic or hyperbolic tangent function (tanh). The number of hidden layers that will be tested are either one or two and the number of perceptron in each layer can be any integer value ranging from 1 to 20. The hyperparameters for each of the models are presented in Table 4. The sizes of the hidden layers of the MLP models are represented using a vector in which each of the elements represent a layer and the value of the element is the number of perceptron’s in that layer.

Table 4: Optimized hyperparameters for each dataset

Algorithm Parameter FS1 FS2 FS3 FS4 FS5 FS6 FS7 FS8 FS9 FS10 FS11 FS12 FS13 FS14 FS15 FS16 FS17 FS18 SVR kernelc 2rbf10 rbf₂14 rbf₂10 rbf₂14 rbf₂11 ₂rbf14 rbf₂12 ₂rbf15 ₂rbf13 ₂rbf15 rbf₂14 ₂rbf15 rbf₂15 ₂rbf14 rbf₂15 ₂rbf15 ₂rbf15 rbf₂14

gamma 2−4 ₂−8 ₂−3 ₂−8 ₂−2 ₂−6 ₂−2 ₂−7 ₂−6 ₂−7 ₂−8 ₂−7 ₂−8 ₂−6 ₂−6 ₂−6 ₂−6 ₂−6

KNN k 17 21 15 50 16 10 25 7 41 21 18 14 13 15 11 8 7 12 RF max_featuresn_estimators log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500 log2500

max_depth 11 14 15 11 10 11 8 9 6 17 18 20 16 19 24 14 20 16 MLP hidden_layer_sizes [15, 15]activation relu relu[2] [11, 9] [2, 2] [15, 15] [8, 8] [11, 9] [11, 9]relu relu relu relu relu relu relu[3] relu[3] relu[4] relu[1] [3, 3]relu relu[2] [2, 18] [19, 19] [13, 7]relu relu relu [1, 1]relu

solver lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs lbfgs TSE no hyperparameter

The performance of the models is then evaluated based on how accurate they are using MAE, RMSE and CV-RMSE. The prefered feature set for each of the algorithms is also analyzed. The highest performing models are then used in a forward-walk prediction test to further validate the model’s robustness and accuracy on other parts of the dataset. This is done in test 2.

3.3.2 Test 2

(25)

Model 1: Model 2: Model N-1: Model N: dataset train test train test ... ... train test train test

Figure 6: Visualisation of walk-forward prediction

The MAE, RMSE and CV-RMSE values are calculated for every model throughout the walk. The average values of the errors will then be compared between the algorithms. The computation time for building and predicting using each of these models will be recorded.

4 Results and discussions

4.1 Feature set evaluation and prediction performance

The results from the first test are presented in Table 5 where the MAE, RMSE and CV-RMSE are calculated using equations (5), (6) and (7). The specific models will be refered to with the abbreviation of the algorithm and the feature set that is used build the model. The model SVR-FS1 is the model that is built with SVR using the feature set named FS1.

Table 5: Performances of the prediction models

Algorithm Metric FS1 FS2 FS3 FS4 FS5 FS6 FS7 FS8 FS9 FS10 FS11 FS12 FS13 FS14 FS15 FS16 FS17 FS18 SVR RMSEMAE 1305983 756984 717962 915693 688923 806598 1147832 1045768 10711432 376560 534350 564356 533353 565361 540356 576360 566360 983608 CV-RMSE 21.83 16.45 16.08 15.31 15.44 13.48 19.18 17.48 23.95 9.44 8.99 9.5 8.98 9.52 9.1 9.7 9.54 10.24 KNN RMSEMAE 686908 1129866 723981 1274956 686934 946749 1133812 1047762 11491477 503672 683512 641476 683514 638469 633467 679489 714503 494692 CV-RMSE 15.19 18.89 16.41 21.3 15.62 15.83 18.96 17.51 24.7 11.31 11.51 10.8 11.51 10.75 10.66 11.43 12.02 11.65 RF RMSEMAE 552774 1036783 599851 987731 579808 874642 1076780 1109810 11361450 368545 601435 546367 585409 542360 579394 575384 612417 414620 CV-RMSE 12.94 17.32 14.24 16.51 13.52 14.61 17.99 18.55 24.24 9.18 10.13 9.21 9.86 9.14 9.76 9.69 10.32 10.45 MLP RMSEMAE 32883439 1065854 10321296 920722 669862 790594 1109799 1061802 11511454 386561 569438 582402 512371 544373 526371 602424 600410 408599 CV-RMSE 57.51 17.81 21.67 15.39 14.42 13.21 18.54 17.74 24.32 9.45 9.59 9.81 8.63 9.16 8.86 10.14 10.11 10.09 TSE RMSEMAE 10561284 1038832 10411285 1032828 10481297 1085853 13711087 1241932 11041444 391579 570402 580390 565396 582390 562388 596393 583389 389596

(26)

The results show that all models that are built on feature sets that contain time-lag features (FS1-FS9) perform better than models that are built on feature sets without time-lag features (FS10-FS18). This is because buildings have a thermal inertia. The historical cooling load and outdoor temperature aﬀect the target cooling load. One reason why this improvement is very large could be because the low-dimensionality of the original data set.

We can also see that the binary representation of calendar features is pre-ferred by SVR, MLP and TSE, while RF and KNN prefer the decimal rep-resentation. The diﬀerences in the accuracy is are small between the two representations and we can also see that KNN prefer the decimal representa-tion. One possible explanation for this could be that the cooling loads often are quite small. For example, the cooling loads are very small at midnight where hour 23 turns into hour 0 of the following day.

The top performing models for each of the algorithms are marked with bold text in Table 5. The top performing models that are built using SVR, KNN, MLP and TSE all use the binary representation of the calendar features. While the RF model is built using a feature set with the decimal represen-tation. The KNN, RF and TSE models have in common that their feature sets all have the features month and day removed. The best SVR and MLP models perform best on feature sets with only the month removed. This is likely to be specific for the training and test set that is used in this set. The two algorithms that create nonlinear models SVR and MLP both share the same favorite feature set.

The models that are built using SVR and MLP perform a little bit better than the other top performing models created using the other algorithms. The diﬀerence of the accuracy of all these five models are marginal. These models all have quite similar results which suggest that all algorithms are capable of capturing the relationships of the data due to the simplicity of the data.

(27)

4300 4350 4400 4450 4500 4550 4600 Data sample number

3000 4000 5000 6000 7000 8000 9000 C oo lin g lo ad (k W ) Actual SVR KNN RF MLP TSE

Figure 7: Prediction results using the top performing models from a two week period

The cooling load is low in the beginning and the end of the day and spikes up in the middle of the day. The results in Table 5 and in Figure 7 show that the models generally perform well on the test set but that they predict quite poorly on certain days.

Table 6: Average prediction errors from the forward-walk prediction

Metric SVR-FS13 KNN-FS15 RF-FS14 MLP-FS13 TSE-FS15

MAE 413 609 488 411 374

RMSE 627 808 688 595 570

CV-RMSE 20.63 26.41 21.72 18.20 17.38

(28)

The KNN model is clearly the worst performing model of these five with the largest errors for all three metrics. The SVR-FS13, RF-FS14 and MLP-FS13 models all preform quite similarly. However, the MLP-FS13 model performs a little bit better than the other two. The best model in this test is clearly TSE-FS15 with the lowest errors for all three metrics.

Models that can predict with a CV-RMSE that is less than 30 percent is considered good enough for engineering purposes [6]. All of the models are lower than that threshold. Existing research that have used similar features as in this study have created SVR models with a CV-RMSE of 23.0 percent and RF models with a CV-RMSE of 31.6 percent on similar data sets [6]. Studies where the other algorithms have been used for hourly predicting the cooling load is very limited or non-existing.

4.2 Computation time

All computations in this study was performed using a MacBook Pro (13-inch, Late 2011) with a 2,4 GHz Intel Core i5 processor. The computation time of building the models and for predicting using the models during the forward-walk prediction is shown in Figure 8 and Figure 9.

2 4 6 8 10 Model number 0.0 0.5 1.0 1.5 2.0 2.5 C om pu ta tio n tim e (s ) SVR KNN RF MLP TSE

(29)

2 4 6 8 10 Model number 0.00 0.02 0.04 0.06 0.08 0.10 C om pu ta tio n tim e (s ) SVR KNN RF MLP TSE

Figure 9: Computation time for predicting using the forward-walk models The results clearly show that the models using KNN are quickest, while the SVR and the RF models take longest time to build. TSE and MLP both predict in almost no time at all. The prediction times of the models that are built using the other algorithms are also very short considering that each model predict cooling load for a 30 day period. All models are developed and can predict in only a matter of seconds and can therefore be used for predicting the cooling load hourly. If one instead whats to predict the cooling load with a finer time-granularity, say every second, then a more careful selection of the algorithms would be needed.

5 Conclusions

Cooling load predictions are essential for improving the eﬃciency of HVAC-systems in buildings. This paper studies the performance of five diﬀerent machine learning algorithms for predicting building cooling load. Each al-gorithm has its own qualities and perform better on certain types of data sets than on others. The algorithms are analyzed using one-year operational data from a building in Shenzhen, China.

(30)

a binary representation instead of a decimal representation in a data set can improve the performance of the predictive models for certain machine learning algorithms.

All algorithms in this study perform similarly which suggests that they are all capable of capturing the relationship between the cooling load and the other variables in this data. The two non-linear algorithms SVR and MLP perform a little bit better than the other algorithms in the first test and the linear algorithm TSE performes a bit better in the second test. The MLP have good performance in both tests and a very low computational load and would therefore be a good choice of algorithm to be used in a cooling load prediction application. Most of the models that are developed in this study have a CV-RMSE of less than 30 percent which is good. In order to improve the accuracy further a more complex raw data set is required.

For further research it would be interesting to see if the accuracy of the models are changed by changing the sizes of the training and the test sets. All performance evaluation metrics in this study evaluate the performance on the whole test set. It would also be valuable to look at the performance of each day independently in order to investigate why the algorithms perform bad on certain days.

(31)

References

[1] International Energy Agency (IEA), Global Energy & CO2 Status Re-port. IEA Publications, 2019.

[2] International Energy Agency (IEA), 2018 Global Status Report. United Nations Environment Programme, 2018.

[3] X. Li and J. Wen, “Review of building energy modeling for control and operation,” Renewable and Sustainable Energy Reviews, vol. 37, pp. 517 – 537, 2014.

[4] C. Fan, F. Xiao, and S. Wang, “Development of prediction models for next-day building energy consumption and peak power demand using data mining techniques,” Applied Energy, vol. 127, pp. 1 – 10, 2014. [5] V. V. Belle and P. Lisboa, “White box radial basis function classifiers

with component selection for clinical prediction models,” Artificial In-telligence in Medicine, vol. 60, no. 1, pp. 53 – 64, 2014.

[6] C. Fan, F. Xiao, and Y. Zhao, “A short-term building cooling load prediction method using deep learning algorithms,” Applied Energy, vol. 195, pp. 222 – 233, 2017.

[7] L. Zhang and J. Wen, “A systematic feature selection procedure for short-term data-driven building energy forecasting model development,” Energy and Buildings, vol. 183, pp. 428 – 442, 2019.

[8] Q. Li, Q. Meng, J. Cai, H. Yoshino, and A. Mochida, “Applying support vector machine to predict hourly cooling load in the building,” Applied Energy, vol. 86, no. 10, pp. 2249 – 2256, 2009.

[9] K. Amasyali and N. M. El-Gohary, “A review of data-driven building energy consumption prediction studies,” Renewable and Sustainable En-ergy Reviews, vol. 81, pp. 1192 – 1205, 2018.

[10] D. Basak, S. Pal, and D. C. Patranabis, “Support vector regression,” Neural Information Processing – Letters and Reviews, vol. 11, no. 10, 2007.

[11] C. Rudin, “K-NN.” MIT 15.097 Lecture 6, 2012.

[12] A. C. Müller and S. Guido, Introduction to Machine Learning with Python. 1005 Gravenstein Highway North, Sebastopol, CA 95472: O’Reilly Media, Inc., 1 ed., 10 2016.

(32)

A Comparative Study of Machine Learning Algorithms for Short-Term Building Cooling Load Predictions