Fault Detection of Hourly Measurements in District Heat and Electricity Consumption

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Fault Detection of Hourly Measurements in District

Heat and Electricity Consumption

Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping

av

Andreas Johansson

LITH-ISY-EX-3637-2005 Linköping 2005

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Fault Detection of Hourly Measurements in District

Heat and Electricity Consumption

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Andreas Johansson

LITH-ISY-EX-3637-2005

Handledare: David Törnqvist isy, Linköpigs universitet Cecilia Malm

Tekniska Verken i Linköping AB Examinator: Torkel Glad

isy, Linköpigs universitet Linköping, 18 February, 2005

(4)

(5)

Avdelning, Institution

Division, Department

Division of Automatic Control Department of Electrical Engineering Linköpings universitet S-581 83 Linköping, Sweden Datum Date 2005-02-18 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://www.control.isy.liu.se

ISBN

—

ISRN

LITH-ISY-EX-3637-2005

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title

Feldetektion av Timinsamlade Mätvärden i Fjärrvärme- och Elförbrukning Fault Detection of Hourly Measurements in District Heat and Electricity Con-sumption Författare Author Andreas Johansson Sammanfattning Abstract

Within the next years, the amount of consumption data will increase rapidly as old meters will be exchanged in favor of meters with hourly remote reading. A new reﬁned supervision system must be developed. The main objective of this thesis is to investigate mathematical methods that can be used to ﬁnd incorrect hourly measurements in district heat and electricity consumption, for each consumer.

A simulation model and a statistical model have been derived. The model parameters in the simulation model are estimated by using historical data of con-sumption and outdoor temperature. By using the outdoor temperature as in-put, the consumption can be simulated and compared to the actual consumption. Faults are detected by using a residual with a sliding window. The second model uses the fact that consumers with similar consumption patterns can be grouped into a collective. By studying the correlation between the consumers, incorrect measurements can be found.

The performed simulations show that the simulation model is best suited for consumers whose consumption is mostly aﬀected by the outdoor temperature. These consumers are district heat consumers and electricity consumers that use electricity for space heating. The fault detection performance of the statistical model is highly dependent on ﬁnding a collective that is well correlated. If these collectives can be found, the model can be used on district heat consumers as well as electricity consumers.

Nyckelord

(6)

(7)

Abstract

Within the next years, the amount of consumption data will increase rapidly as old meters will be exchanged in favor of meters with hourly remote reading. A new reﬁned supervision system must be developed. The main objective of this thesis is to investigate mathematical methods that can be used to ﬁnd incorrect hourly measurements in district heat and electricity consumption, for each consumer.

A simulation model and a statistical model have been derived. The model parameters in the simulation model are estimated by using historical data of con-sumption and outdoor temperature. By using the outdoor temperature as in-put, the consumption can be simulated and compared to the actual consumption. Faults are detected by using a residual with a sliding window. The second model uses the fact that consumers with similar consumption patterns can be grouped into a collective. By studying the correlation between the consumers, incorrect measurements can be found.

The performed simulations show that the simulation model is best suited for consumers whose consumption is mostly aﬀected by the outdoor temperature. These consumers are district heat consumers and electricity consumers that use electricity for space heating. The fault detection performance of the statistical model is highly dependent on ﬁnding a collective that is well correlated. If these collectives can be found, the model can be used on district heat consumers as well as electricity consumers.

(8)

(9)

Acknowledgements

This thesis could not have been carried out without the help from a number of persons, to whom I owe a great debt of gratitude and would like to thank for their valuable contributions.

First and foremost, my thanks go to all employees at Tekniska Verken i Linköping AB, who has provided me with smile and laughter at the lunch brakes. Especially I would like to thank my supervisor Cecilia Malm, who has guided me through the jungle of measurements and system databases. Mostly, thanks to my supervisor David Törnqvist at Linköpings universitet. Without your commitment and valu-able ideas, this report would not exist today. Special thanks to Soﬁa Pettersson for valuable proofreading. Last, but not least, I would like to thank all my friends in Lusen Big Band for all love and support during the time of my studies. Without you guys, I would never made it this far. Thanks to all for your support and help! Linköping, February 2005

Andreas Johansson

(10)

(11)

Introduction

1.1 Background

Today Tekniska Verken i Linköping AB has over 100 000 meters for electricity, district heating, district cooling and water. Measurements are collected with in-tervals of once an hour, up to once a year. Within the next years the hourly remote reading will increase, as the old meters with yearly or monthly manual readings are exchanged in favor of new meters with hourly measurement readings.

Today’s methods for checking the correctness of a collected measurement are sparse. The collected measurement is compared to a precalculated yearly con-sumption according to the consumers’ concon-sumption pattern. The measurements that fail the test will be sent to a fault file for manual observation and correction. As manual reading is being exchanged with remote reading, the amount of data will increase rapidly. This yields new possibilities as well as new problems. With the methods used today, the measurements that fail the control will increase as the amount of data increase. Different types of temporary measure faults, not caught by the present system, will be visible with hourly reading. Not checking the meters with the human eye is also a risk. Meters that are broken, or has been damaged in some way, may be missed without a refined automated supervision system. The possibilities lie in the new amount of data together with an automated supervision system that can detect incorrect measurements without manual inspection.

1.2 Problem Description

Research on ﬁnding incorrect measurements in consumption data has been done by [1], [6] and [5]. In these studies, historical data have been used to predict the total energy demand in a certain region. The consumption pattern varies for each consumer. If district heating or electricity is used for space heating, the consumption pattern shows season variations, corresponding to temperature changes. For some consumers, especially industries, the consumption is higher during weekdays, while other consumers have the same consumption irrespective

(14)

2 Introduction

of the day or time of the year. Stochastic behavior, such as social factors, does also affect the consumption. The influence from a specific consumer, does not affect the total demand that much. Fault detection of the total energy demand, can therefore be seen as an easier problem. Since the task in this thesis is to detect faults in the consumption for each consumer, the problem is to find a general model that can be used on all types of consumers; both district heat and electricity consumers.

1.3 Objectives

The purpose of this thesis is to investigate mathematical methods that can be used to detect incorrect measurements in consumption data. To make these in-vestigations, mathematical models should be developed. Simulations should be performed to verify the fault detection performance of the derived models.

1.4 Delimitation

The behavior of district heat and electricity consumption is quite a complex process. To cover everything during the period of a masters thesis is impossi-ble, therefore some delimitations has to be done.

• The study is limited to investigate fault detection of measurements in district

heat- and electricity consumption.

• The day mean value of district heat and electricity consumption is used in

all modeling.

• The methods can not be evaluated on all consumers. A number of consumers,

which can be seen as good representatives, will be used.

1.5 Thesis Outline

Chapter 2 is a review of how a measurement is generated, collected and stored. The common faults are also presented. In chapter 3 a simulation model and a statistical model are derived. These models are used in chapter 4 to simulate and detect faults in district heat and electricity consumption. The report is concluded with conclusions and suggestions for further studies in chapter 5. To make the report easier to read, most of the ﬁgures from the performed simulations can be found in appendix A and B.

(15)

Chapter 2

System Overview

This chapter describes how a measurement reading is generated, collected, stored and treated by the system implemented today. Tekniska Verken AB has over 100 000 meters for district heating, electricity, district cooling and water consump-tion. There are different types of meters depending on what is to be measured. Since this thesis focuses on district heating and electricity consumers, meters and measurements belonging to these fields are discussed. The different types of faults that can occur are described in section 2.5.

2.1 From Measurement Reading to Database

Figure 2.1 shows a typical installation of a district heating supply system. The hot water from the supplier passes a heat exchanger and is returned back to the supplier. Prior the heat exchanger, the supply temperature and the volumetric flow is measured. When the hot water has left the heat exchanger, the return temperature is measured. Temperatures and flow are fed into a calculator. The flow is registered as pulses/liter, i.e., when a certain volume has passed the meter a pulse is generated to the calculator. Depending on the consumer, meters with different resolutions are installed. Typical values can be 2.5 liters, or 25 liters to generate a pulse. There are different types of meters that use propellers or ultra-sound to measure the flow. The calculator computes the district heat consumption Q according to the formulae

Q =

v1

v0

kδT dv

where δT = Supply temp − Return temp, v = Volumetric ﬂow and k is the heat coeﬃcient. k is a function of temperature and pressure. The calculator performs the integration and uses a built-in table with k-values for the current temperature and pressure. More on how the calculator works can be found in [9].

The principles of an electricity meter, is that voltage and current are measured. A calculator performs an integration to receive the consumed energy.

(16)

4 System Overview Supply pipe Return pipe Building Heat exchanger Supply temp Return temp Flow

Figure 2.1. A typical installation of a district heat supply system.

The supply and return temperatures, ﬂow and consumption are passed on to a terminal that uses either S0-pulses or a serial protocol called MBUS. If S0-pulses are used, a pulse is generated when a certain consumption has been registered, e.g., one pulse every kWh. With the MBUS-protocol, meter readings are collected without using pulses. The actual function of this terminal, irrespective if S0-pulses or the MBUS-protocol is used, is to transmit the information to the Automatic Meter Reading (AMR). This is done via diﬀerent media. The most common media types are IP, Radio or the power line cables. Figure 2.2 shows a schematic overview of how a measurement is generated, transmitted and stored.

Once a day, during nighttime, the hourly readings are collected by the AMR. These readings are later stored in a database called METER IN. This database contains constants that are used to scale the pulses to actual consumption, tem-perature etc. After the pulses have been scaled the data are stored in a second database called METER BAS. This database contains consumption in MWh, tem-perature in degrees Celsius etc. All data used in this thesis are data from this database.

2.2 Debit System

The system of today uses the measurements mostly to charge a consumer for its consumption. On the turn of the month an estimate of the year consumption is calculated by the system. If the new estimated consumption differs from the present year consumption, the system will alarm and the consumer-id will be sent to a fault file. This fault file is inspected manually to find the explanation for the deviation in consumption. If the changed consumption assumes to relate to a faulty meter, e.g., data is missing or is times ten as high as normal, the meter will be inspected. If the change in consumption can be related to a change in the

(17)

2.3 Available Data 5 Supply temp Return temp Flow MBUS/S0 AMR DB Consumption Transmission Electricity CALC CALC District Heat Voltage Current Consumption

Figure 2.2. A schematic overview over how the consumption from the calculator is

transmitted and registrated by the AMR and stored in the DB.

consumption pattern, i.e., a change in energy needs, the previous year consumption is replaced by the new estimated year consumption and the consumer will be charged according to this new year consumption.

2.3 Available Data

The data used in this thesis are the district heat consumption, electricity con-sumption and outdoor temperature. The outdoor temperature is collected from SMHI1_{as a day mean value. This is why the mean day district heat consumption}

is used in the simulation model. If the hourly readings are to be used one has to be able to measure the hourly outdoor temperatures. The day mean value of the electricity consumption has been used in the statistical model, since it is easier to ﬁnd correlation in day mean values than in hourly measurements.

Depending on if S0-pulses or the MBUS-protocol are used to collect the mea-surements, the consumption is stored as actual consumption or meter readings. Actual consumption means the consumption during the last hour. Meter readings stands for the accumulated consumption from the time the meter was installed.

If the measurements from a consumer are the actual consumption, the day mean consumption is calculated as ₂₄1 24_i=1y(i), where y is the consumption. If

the measurements are meter readings the day mean consumption is calculated as

1

24(y(24) − y(1)).

(18)

6 System Overview

2.4 Data Quality

In all model building one must ensure that the data to be used are free of outliers or other faults. Since the system of today mostly uses the data to charge the con-sumers, the incorrect measurements in the database are not corrected. As a result, there exist data in the database that is incorrect. The meaning of an incorrect measurement is also a bit vague. This is why the data must be investigated extra carefully prior the modeling. Datasets including obvious faults must be weeded out. Another important fact is that one can not experiment with the measure-ments to ﬁnd out more about the system characteristics. The only available data are the historical measurement data stored in the database.

2.5 Faults to be Detected

Since the models in this thesis are to be used in fault detection and the concept incorrect measurement is a bit vague, this has to be defined. Faults that can occur will now be defined as different fault modes and a possible explanation to this fault will be given.

• Fault mode 1 - The measurements suddenly diverge from the assumed

consumption.

This fault can arise from an incorrect meter, incorrect constant in the database METER IN, missing data or a sudden change in energy consumption for a consumer.

If some or any of the connections between the meter and AMR cease to function data will be missing. If the measurements are collected as actual consumption, the consumption data from these time samples will be lost and stored as zeros in the database. If the measurements are collected as meter readings, missing data will not result in lost consumption data. On the other hand, if data are missing, zeros will be followed by a peak in consumption when the connection between the meter and AMR starts to function.

For example, if the consumption is ten times as high or low as assumed, the most likely explanation is an incorrect constant in the database METER IN. These faults can occur when a meter has been exchanged. The constants in METER IN are used to scale the pulses from the meters to consumption in, MWh, kWh, etc. The constants are treated manually by a system operator. Sometimes the constants are exchanged or in some other way missed in the routines. For district heat meters, the constant can be exchanged with a factor 10. If the constants are mixed up in the routines, the district heat consumption can be ten times as high, or low, as it should be. The constants for an electricity meter, can be exchanged with the factors 10, 20, 30, 40, 60, 80, 100 and 120. This means that if the constant 10 and 20 are mixed up, this can result in a consumption that is twice, or half the actual consumption etc.

(19)

2.5 Faults to be Detected 7

It is often hard to tell if the deviation in consumption is related to an actual fault or a change in consumption. If a consumer suddenly

determines to turn on or oﬀ the consumption this is not an actual “fault”. Such a change in energy consumption must therefore be considered as a fault by the system.

• Fault mode 2 - The measurements slowly diverge from the assumed

consumption.

This fault arises from a faulty meter or a slow change in energy consumption for a consumer.

When the meters have been in use they start to age. Even though meters are collected and calibrated continuously, the mechanical wear can result in a slowly increased or decreased consumption.

A slow change in consumption can also be explained by the fact that the consumer has changed its energy needs. Since it is hard to separate this from a faulty meter this must also be considered as a fault.

There are also cases when the consumer manipulates the meter to reduce its energy costs. These faults can be hard to detect, especially if a model is estimated from a dataset where the consumer has manipulated its meter. If this is the case, this type of fault will be built in to the model.

Since the diﬀerent faults that can occur does not have to be isolated, i.e., to point out the exact type of fault that is present, the classiﬁcation presented above is suitable for its purpose. To isolate the present fault one must have separate models for each component of the entire district heat supply system. Such models have been investigated in [1] but due to the limited time period for this thesis, this has not been investigated further.

(20)

(21)

Chapter 3

Modeling

This chapter describes the main modeling work. Two model approaches are de-rived. The ﬁrst model uses historical consumption and temperature data to es-timate model parameters. This model can be used to simulate the consumption with outdoor temperature as input. In the second model approach, consumers with similar consumption patterns are grouped into a collective. The correlation between the consumers, are used to ﬁnd deviations from the collective.

3.1 District Heat Characteristics

The energy demand for the district heat consumers is mainly due to their need for space heating and hot tap water. The need of district heat for space heating is mostly aﬀected by the outdoor temperature. The need of hot tap water can be explained by other factors [1], e.g., the time of the day.

In Figure 3.1 the district heat consumption for two diﬀerent consumers are shown together with the outdoor temperature. It is obvious that the district heat consumption and the outdoor temperature are correlated. A low outdoor temperature correspond to a high consumption and vice versa. During weekends industries often shut down. This is seen in the district heat consumption for the slaughter house which shows a typical weekday/weekend consumption pattern. The shopping mall on the other hand does not show the same weekday/weekend trends since it is open almost 365 days a year.

An identical change in temperature can result in different consumption de-pending on the absolute value of the temperature. A temperature change of5◦C correspond to different consumption depending on the time of the year. This indicates that the system is nonlinear. As seen in Figure 3.2 the district heat consumption is approximately linear up to a certain break point of about 14◦C. Above this breakpoint the district heat consumption is almost zero or at least constant. This phenomenon can be explained by the fact that no extra energy is needed to warm the building above the break point. The breakpoint varies from consumer to consumer, depending on the isolation of the building. The influence of this breakpoint will be discussed in section 4.1.1. For a building with good

(22)

10 Modeling 0 50 100 150 200 250 300 350 400 −20 0 20 40 degrees C Outdoor Temp 2001 0 50 100 150 200 250 300 350 400 0 0.2 0.4 0.6 0.8 MWh

District Heat 2001 − Shopping Mall

0 50 100 150 200 250 300 350 400 0 1 2 3 day nr MWh

District Heat 2001 − Slaughter House

Figure 3.1. Outdoor temperature and district heat consumption for year 2001. The

typical Weekday/Weekend trends can be seen in the consumption for the slaughter house.

isolation the breakpoint is lower and vice versa. The deviations in measurements for the slaughter house can be explained by the weekday/weekend trends.

The two consumers mentioned above are just an example of what the consump-tion pattern can look like. The diﬀerent consumers on the district heat market consist of big industries, schools, shopping malls, detached houses etc. It is clear that there exist diversity in consumption patterns for each consumer or consumer type.

The district heat characteristics can be summarized as

• It is time dependent • It is weather dependent • It is nonlinear

(23)

3.2 Simulation Model 11 −200 −15 −10 −5 0 5 10 15 20 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 MWh

District Heat 2001 − Shopping Mall

−200 −15 −10 −5 0 5 10 15 20 25 0.5 1 1.5 2 2.5

Outdoor temperature − degrees C

MWh

District Heat 2001 − Slaugther House

Figure 3.2. District heat consumption vs. outdoor temperature for the shopping mall

and the slaughter house. The district heat consumption is linear up to a break point of about14◦C. The deviations in measurements for the slaughter house correspond to the weekday/weekend trends.

3.2 Simulation Model

One can assume that a consumer’s consumption pattern, i.e. the base consumption without inﬂuence of the outdoor temperature, is the same from one year to another, unless the consumer decides to rebuild its house, a new family moves in, or the energy need is changed in some other way. If this is the case the system should alarm to inform the district heat supplier that the consumer has changed its energy needs. This assumption leads to the model described in the next section.

3.2.1 Model Approach

In [4] a simulation model for the total hot water demand in Reykjavik is presented. The simulation model presented in this thesis differs in the way that the district heat consumption for a specific consumer is to be modeled, not the total demand as in the article. This model approach, with some modifications, will be used to simulate the district heat consumption.

(24)

12 Modeling

The consumption is divided into two parts. The ﬁrst part is a time dependent part that describes the base consumption of district heat for space heating and hot tap water. The second part is the weather dependent part that explains how the district heat consumption is aﬀected by the outdoor temperature. The model can be described as

y(t) = b(t) + w(t) + error (3.1) Here b(t) is the base consumption corresponding to an outdoor temperature speciﬁed by the break point, i.e., approximately14◦C. w(t) is the weather depen-dent part and is a function of the outdoor temperature. y(t) is the district heat consumption. The last part is the stochastic error process.

The base consumption can be modeled as

b(t) = b0+ b1I(t) + b2sin(2πt/365) + b3cos(2πt/365)+

b4I(t) sin(2πt/365) + b5I(t) cos(2πt/365) (3.2)

where I(t) is a binary variable that is one for weekdays and zero otherwise. This is to describe the extra consumption during weekdays. The sine and cosine terms describe how the district heat consumption varies throughout the year even if the eﬀect of outdoor temperature is removed.

In Figure 3.3 the base adjusted district heat consumption for the slaughter house is plotted versus the outdoor temperature. This illustrates the use of the base consumption in the model. The previous deviations in measurements corre-sponding to the weekday/weekend trends are now removed. With the base ad-justed consumption, a weather dependent part can be described as

w(t) = b6z1+ b7z2 (3.3)

where

z1= max {BP − u(t), 0} and z2= max {u(t) − BP, 0}

Here u(t) is the mean day outdoor temperature.

Historical data for a consumer’s district heat consumption and the outdoor temperature are used to estimate all parameters in a least square sense. See section 3.2.2 for details. This yields the predicted district heat consumptionˆy(t) = ˆb(t) + ˆw(u(t)), where ˆb(t) is the predicted base consumption and ˆw(u(t)) is the predicted weather dependent part. The simulated district heat consumption then becomes ˆy_sim(t) = ˆb(t) + w_sim(u(t)), where u(t) is the outdoor temperature from the period that is to be simulated. The simulated consumption can be compared to the actual consumption and then be used for fault detection.

3.2.2 Parameter Estimation

The parameter estimation procedure applies to the principle of minimizing the quadratic sum of the simulation error. Since the simulated consumption ˆy(t) is a linear function of the parameters θ, i.e., ˆy(t) = θTϕ(t), the problem can be

(25)

3.2 Simulation Model 13 −200 −15 −10 −5 0 5 10 15 20 25 0.5 1 1.5 2 2.5 MWh

District Heat 2001 − Slaughter House

−20 −15 −10 −5 0 5 10 15 20 25 −0.5 0 0.5 1 1.5 2

Outdoor Temperature − degrees Celcius

MWh

Base Adjusted District Heat 2001

Figure 3.3. Illustration of the base demand part in the model. The base consumption

b(t) has been subtracted from the measured consumption. The base consumption part

takes care of the weekday/weekend trends as well as the season variations.

parameters and ϕ(t) is a vector that consists of the base functions in the model. If the consumption from n time samples are stored in Y and the corresponding values of the base functions are stored in Θ, the least square problem, for the model deﬁned in (3.1), can be formulated as

min θ Y − Θθ (3.4) Y = ⎛ ⎜ ⎝ y(1) .. . y(n) ⎞ ⎟ ⎠

(26)

14 Modeling θ = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ b0 b1 b2 b3 b4 b5 b6 b7 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ Θ =

₁ _I(1) _{sin(2π1/365)} _{cos(2π1/365)} _{I(1)sin(2π1/365)} _{I(1)cos(2π1/365)}

z1(1) z2(1) . . . . . . . . . . . . . . . . . . . . . . . . 1 I(n) sin(2πn/365) cos(2πn/365) I(n)sin(2πn/365) I(n)cos(2πn/365) z1(n) z2(n)

According to [3], the solution to (3.4) is given by the normal equationsΘTΘθ = ΘT_{Y . Finally the estimated parameters are given by}

ˆθ = (ΘT_Θ)−1_ΘT_Y _(3.5)

When calculating the matrix inverse(ΘTΘ)−1 numerical problems may occur. This can be handled by using the Matlab-command pinv that computes the pseudo inverse using singular value decomposition. More on numerical aspects for matrices and matrix inverses can be found in [3].

3.2.3 Validation

When a model has been derived it must be validated to ﬁnd out whether the model is suitable for its purpose. Since the purpose is to use the models for fault detection the fault detection itself is a kind of validation. If the faults can be detected by the chosen model the model can be seen as “good enough”. On the other hand there are several ways to validate a model [7]. An important factor is the prediction error, which is deﬁned as

(t) = y(t) − ˆy(t) (3.6) In the ideal case the simulation error should be white noise, i.e., (t) ∈ N (0, σ).

When a model has been simulated, a histogram plot or a normal probability plot can be used to verify if the simulation error is white noise or not.

To find out if all system dynamics are caught by the model, the auto correlation sequence of the simulation error can be studied. The auto correlation sequence is defined as Ry(t) = 1 N N−t k=1 y(k)y(k + t) t ≥ 0 (3.7) The simulation error can then be used to create an estimate of the standard error of the model. This measure can be used to compare different models. The estimated standard error, according to [4], is defined as

(27)

3.2 Simulation Model 15 ˆσ = 1 N _N t=12(t) < y > (3.8)

where < y > is the mean consumption during the period that is studied.

Another approach is to validate the model with data from separate periods. If the results are consequent, this implies that the model approach is correct.

These measures will be used and discussed further in section 4.1.2.

3.2.4 Fault Detection

To be able to detect the faults described in section 2.5, a residual and a threshold must be calculated. One approach is to use the prediction error (3.6). A residual should be zero, or small, when no faults are present and over a certain threshold when faults are present. To use the absolute prediction error as a residual is not suitable, since the residual would be too sensitive for separate incorrect measure-ments. Instead, the residual can be seen as a model validity measure. A common method is to determine how well the model can simulate the data over a certain time period or time window. The residual can then be calculated as

r1(t) = 1 L t k=t−L+1 2k (3.9)

where L is the length of the time window. The residual will be small when no faults are present, i.e., when the diﬀerence in modeled consumptionˆy and measured consumption y is small. If the residual exceeds a threshold, the measurement can be seen as faulty and the system should alarm.

Since ∈ N (0, σ) the residual, according to [2], becomes χ2(L), i.e.,

r1(t) = 1 σ2 1 L t k=t−L+1 2_k∈ χ2(L) (3.10)

A threshold can be calculated as

J =σ

2

LF

−1_(p|L) _(3.11)

where F−1 is the inverse χ2 cumulative distribution function and p is the probability of a false alarm.

Instead of using the squared simulation error 2, as in (3.9), a residual with the simulation error can be written as

r2(t) = 1 L t k=t−L+1 k (3.12)

(28)

16 Modeling r2(t) = 1 L t k=t−L+1 k∈ N(0,√σ L) (3.13)

A threshold can be calculated as

J = F−1(p|0,√σ

L) (3.14)

where F−1 is the inverse N cumulative distribution function and p is the proba-bility of a false alarm.

In the fault free case, the false alarm probability is deﬁned as

p = P (|r| > J) (3.15) The residual and threshold are calculated from a dataset containing no faults. The residual from the simulated data is compared to the threshold. If the residual exceeds the threshold the measurement is considered to be incorrect, i.e.,

|r| > J ⇒ Fault or |r|

J > 1 ⇒ Fault (3.16)

The residuals (3.9) and (3.12) will be discussed further in section 4.1.3.

3.3 Statistical Model

Tekniska Verken i Linköping AB has more than district heat consumers. If the measurements from an electricity or water consumer are to be used in fault detec-tion, the simulation model is not suitable, since the electricity or water consump-tion is not affected by the outdoor temperature in the same way as the district heat consumption. While district heating is mostly used for space heating, the use of electricity is more versatile. The electricity consumption is, e.g., affected by social factors that are difficult to describe with physical relations. On the other hand, one can assume that there exist consumers that have the same appearance in consumption patterns. If the consumers with same consumption pattern can be found, they could constitute a collective. The correlation between the consumers in the collective can be used to find deviations from the collective. The deviations indicate that a fault is present. If collectives can be found for all consumer types, district heating consumers, electricity consumers, water consumers etc., the model approach can be used on all kinds of consumers. This is the main motive for the model derived in this section.

3.3.1 Model Approach

Assume that there exist a number of consumers that have similar consumption patterns. In the fault free case the consumers’ consumption should follow the same consumption pattern. This is the same as the consumers’ are well correlated. The coeﬃcient of correlation can be used to evaluate the correlation between the

(29)

3.3 Statistical Model 17

consumers in a collective. According to [2], the coeﬃcient of correlation between two stochastic variables, X and Y , is deﬁned as

ρ(X, Y ) = Cov(X, Y )

D(X)D(Y ) ρ ∈ [−1, 1] (3.17)

If ρ = 0 X and Y are none correlated. ρ = 1 means that X and Y are 100% correlated. A negative value means negative correlation.

TheMatlab-command corrcoef is used to calculate the correlation coeﬃcient for n consumers as

[R, P ] = corrcoef(M) (3.18)

M is an n × m matrix, with n consumers and m observations of the

con-sumption. R is an n × n matrix, containing correlation coeﬃcients between the consumers’ in the collective. P is an n × n matrix containing p-values for test-ing the hypothesis of correlation, under the assumption that the observations are normally distributed. The hypotheses are deﬁned as

H0: “Correlation” (3.19)

H1: “No Correlation” (3.20)

If a p-value exceeds α, the null hypothesis is rejected with signiﬁcance α and the alternate hypothesis is accepted. Thus, the P -matrix can be used to detect when the consumption from a consumer deviates from the collective’s consumption. More on how theMatlab-command corrcoef works can be found in [8].

3.3.2 Fault Detection

At time t, the coeﬃcient of correlation can be calculated by using a sliding win-dow of length L. This means that the collective should be correlated during L time samples. At each time sample, the R- and P -matrixes are updated and the hypothesis of correlation can be evaluated. The fault detection algorithm, which is implemented inMatlab, can be summarized as

• At time t, calculate the R- and P -matrixes by using historical consumption

data from L previous samples.

• Search the P -matrix for p-values that exceeds α. If one or more p-value in

the same column exceeds α, then the null hypothesis can be rejected, i.e., the consumption for the consumer corresponding to this column deviates from the collective and a fault is detected.

• Same procedure for time t + 1.

The number of p-values that should exceed α to reject the null hypothesis, can be varied. In the following simulations, the null hypothesis is rejected if one or more p-value in the same column exceeds α. The implemented algorithm will be discussed further in section 4.2.2.

(30)

18 Modeling

3.3.3 How to Choose a Collective

The main condition for this model approach is that the consumers can be divided into collectives. Today the consumers are grouped into categories based on basic data from SCB1_{. These groupings are too imprecise to serve as collectives; hence}

new reﬁned groupings must be done. The consumers in the collective must be chosen from a measurement technical point of view.

To divide all consumers into suitable collectives is a huge task. Since time is a limiting factor, the goal is not to ﬁnd these collectives. Instead methods on how to ﬁnd a suitable collective will be discussed.

Assume that there exist a number of consumers that can be seen as representa-tives of a collective. The model itself can be used to ﬁnd out which consumers that can serve as a collective. If fault free data are used, faults must not be detected during L time samples. If a consumer does not belong to the collective, the fault detection algorithm will interpret the consumers’ data as incorrect and a fault will be registered. This approach is used in section 4.2.1.

(31)

Chapter 4

Simulations and Results

In this chapter simulations are performed for the two models derived in chapter 3. The simulation model is validated using the model validity measures described in section 3.2.3. District heat consumers with incorrect consumption are used to evaluate the fault detection performance. Two collectives are derived and used in simulations for the statistical model.

4.1 Simulation Model

During the time of this master thesis, several model approaches with different number of parameters have been evaluated. By using the validity measures de-scribed in section 3.2.3, the proposed model in section 3.2.1 has been derived. To investigate the generality of the model, the proposed model has been used on simulations on a number of consumers from different categories, i.e., consumers with different consumption patterns. The model must cope with consumers that have weekday/weekend trends and those who have not. To present the perfor-mance of the model, two consumers that can be seen as good representatives of the consumers on the district heat market will be used to validate the proposed model. The first consumer has little influence of weekday/weekend trends while the second consumer has big influence of weekday/weekend trends. Since the pur-pose is to detect incorrect measurements, the fault detection can be seen as the actual validation of the model. To evaluate the fault detection performance, four consumers with actual faults are simulated.

4.1.1 Simulation Prerequisites

All data that are used in the simulations are historical data of the district heat consumption and outdoor temperature. Two datasets for each consumer are col-lected from the database METER BAS. The ﬁrst data set is used to estimate the model parameters and the second to validate the model. Since the model contains a time dependent periodicity of a year, the data used for parameter estimation

(32)

20 Simulations and Results

should also contain data for a year. To ﬁnd datasets that does not contain incor-rect measurements from two years is diﬃcult. During a time period of two years it is not unlikely that faults occur. The system with remote reading is quite new, which also result in a reduced number of consumers to choose from.

The data used for validation should to the greatest possible extent be free of outliers or other obvious faults. Since the concept faulty data is a bit vague, it is diﬃcult to point out exactly when a measurement can be interpreted as incorrect. Therefore consumers that have periods containing obvious faults are sorted out.

The breakpoint, discussed in section 3.2.1, is set to14◦C in all simulations. This breakpoint varies for each consumer depending on the isolation of the building. Simulations has shown that the optimal breakpoint can vary from 10◦C up to 16◦_{C. If the breakpoint is to be set as a model parameter, the parameters has to}

be estimated with a nonlinear parameter estimation procedure. This has not been done, since the standard error of a simulation over a year diﬀers only with ±1-2% in the cases where the optimal and the chosen breakpoint diﬀers as most. A breakpoint of14◦C can therefore be seen as a good representative of the consumers’ breakpoints.

4.1.2 Validation

In Figure 4.1, the district heat consumption from the first consumer, with little influence of weekday/weekend trends is simulated. The mean day district heat consumption and day mean outdoor temperature from year 2001 is used to estimate the model parameters. The solid line is the measured consumption y and the dotted line is the simulated consumption ˆy. The standard error defined by (3.8) becomes100ˆσ2001= 7.46. With the estimated model parameters and the mean day

outdoor temperature from year 2003 the district heat consumption for year 2003 is simulated. The standard error for the simulated period becomes100ˆσ2003= 10.66.

The standard error for year 2001 is lower since the data are used to estimate the model parameters. If the data from year 2003 is used to estimate the model parameters and the data from year 2001 is simulated the standard error becomes 100ˆσ2003 = 10.05 and 100ˆσ2001 = 8.26. The results are consequent which implies

that the model approach is correct.

The simulations for the second consumer, with big inﬂuence of weekday/weekend trends, are shown in Figure A.1. The standard error for year 2001 becomes 100ˆσ2001 = 15.40 respectively 100ˆσ2003 = 16.66 for year 2003. When the model

parameters are estimated with data from year 2003 the standard error becomes 100ˆσ2003 = 16.01 respectively 100ˆσ2001= 15.83. The result is consequent also for

the consumer with big inﬂuences of weekday/weekend trends.

In the ideal case the simulation error should be white noise, i.e., ∈ N (0, σ).

Figure A.2 and A.3 show histogram plots and auto correlation sequence of the sim-ulation error for the two consumers. As seen, the distributions for the simsim-ulation errors diﬀer from the ideal case, especially for the simulation on fresh data. The auto correlation sequence shows that there are some periodicities in the simulation error, not caught by the model. How this aﬀects the fault detection performance will be discussed in section 4.1.3.

(33)

4.1 Simulation Model 21 0 50 100 150 200 250 300 350 0 0.02 0.04 a [MWh] 0 50 100 150 200 250 300 350 −5 0 5x 10 −3 _b [MWh] error 0 50 100 150 200 250 300 350 0 0.02 0.04 c [MWh] 0 50 100 150 200 250 300 350 −5 0 5x 10 −3 [MWh] d day number error y y hatt y y hatt

Figure 4.1. Consumer with little influence of weekday/weekend trends. Data from year

2001 used for estimation (a). The solid line is the measured consumption and the dotted line is the simulated consumption. Simulation error for the estimated data (b). Data from year 2003 used for simulation (c). The solid line is the measured consumption and the dotted line the simulated consumption. Simulation error for validation data (d).

4.1.3 Fault Detection

Thresholding

The two residuals derived in section 3.2.4 are tested on the ﬁrst consumer with little inﬂuence from weekday/weekend trends. Data from the periods year 2001 and year 2003, same as used for estimation and validation, are considered to be correct. If a threshold is calculated from the dataset used to estimate the model parameters, the threshold will be too low since the variance for 2001is lower than

the variance for 2003. A more fair method is to calculate the threshold for the

fresh data from year 2003. The drawback is that correct data from two separate years must be available.

In ﬁgure 4.2 the residual (3.9) is used. The threshold is calculated with the dataset from year 2003. The length of the time window is set to L = 30 and the

(34)

false alarm probability to p = 0.01. With a false alarm probability of p = 0.01, no more than 1% of the values from a correct data set should be over the threshold. With the calculated threshold (3.11), 30 measurements of 364, i.e., 8.2%, are considered as incorrect. This indicates that the threshold is too low which increases the risk that too many false alarms are generated. The explanation is that the simulation error is not suﬃciently normally distributed. This leads to an even worse approximation of the χ2 distribution, especially for large windows L. If a time window of L = 5 is used, 4.7% of the measurements are considered as incorrect. The threshold is still too low.

0 50 100 150 200 250 300 350 0 1 2 3 4 5 6 7x 10 −6 _a Residual with L=30 0 50 100 150 200 250 300 350 0 0.2 0.4 0.6 0.8 1 1.2 1.4x 10 −5 _b day number Residual with L=5

Figure 4.2. The residual 3.9 with L = 30 (a). The false alarm probability is set to

p =0.01. 8.2% percent of the measurements in the fault free case are considered as

incorrect, i.e., the threshold is too low. If the window lenght is reduced toL =5, as in (b), 4.7% of the measurements are considered as incorrect. The threshold is still too low.

The reason to use residual (3.12) is that the calculations of the thresholds should be less sensitive towards the normal approximation of the simulation error

. However, when simulations are performed the thresholds are too low even with

(35)

performance, the residual (3.9) will be used in the following simulations.

One approach to make the simulation error more normally distributed is to estimate an AR-model to the simulation error. This has been tested, but the thresholds are still too low even with this approach.

In Figure A.4, the consumption for year 2004 is simulated and the residual (3.9) is used. With the threshold calculated from the data set from year 2003, faults are detected around day 130 and day 300. The fault at day 300 is an actual fault and should be registered. It is not obvious that the fault detected around day 150 should be registered as a fault. This kind of fault can be explained by the fact that the consumer uses more or less district heat than can be related to a change in outdoor temperature. If such faults should not be detected, theoretical thresholds can not be used. Instead thresholds based on experience can be used. If the threshold is set too high, there is always a risk that actual faults are missed. The value of the threshold is a balance between detecting false alarms and missing actual faults.

Simulations

In the following simulations the residual (3.9) is used to detect actual faults as described in section 2.5. Two simulations are performed for each consumer. The ﬁrst simulation uses a threshold calculated as in (3.11) with p =0.01, while thresh-olds based on experience is used in the second simulation. The length of the time window is set to L =10 in all simulations. Figures of the simulations can be found in appendix A, Figure A.5-A.12.

The main results from the simulations can be summarized as follows

• The residual (3.9) can detect the faults deﬁned in section 2.5.

• If the threshold is calculated as in (3.11) false alarms will be registered. By

using a threshold based on experience the number of false alarms can be reduced without missing the actual faults.

• The simulations are performed on consumers from diﬀerent categories with

more or less inﬂuence on weekday/weekend trends. Faults are detected irre-spective of the consumer category.

4.2 Statistical Model

4.2.1 Finding the Collective

Fault free district heat consumption for ﬁve consumers is shown in Figure 4.3. These consumers have been chosen, since they have a similar consumption pattern and can therefore be seen as possible candidates of a collective. If all observations of the consumption are used, the correlation matrix for the ﬁve consumers becomes

(36)

24 Simulations and Results R = ⎛ ⎜ ⎜ ⎜ ⎜ ⎝ 1.0000 0.9563 0.9519 0.9514 0.9823 0.9563 1.0000 0.9167 0.9550 0.9732 0.9519 0.9167 1.0000 0.9405 0.9593 0.9514 0.9550 0.9405 1.0000 0.9743 0.9823 0.9732 0.9593 0.9743 1.0000 ⎞ ⎟ ⎟ ⎟ ⎟ ⎠

To ﬁnd the consumer that is least correlated, the R-matrix can be studied. The minimal column sum corresponds to the consumer that is least correlated. In this case, with all observations used, consumer three is least correlated.

0 200 400 600 0.01 0.02 0.03 0.04 Consumer 1 District Heat [MWh] 0 200 400 600 0.01 0.02 0.03 0.04 Consumer 2 District Heat [MWh] 0 200 400 600 0.01 0.02 0.03 0.04 Consumer 3 District Heat [MWh] 0 200 400 600 0.01 0.02 0.03 0.04 0.05 Consumer 4 District Heat [MWh] 0 200 400 600 0.01 0.02 0.03 0.04 0.05 Consumer 5 District Heat [MWh] 0 200 400 600 1 2 3 4 5 Consumer Nr Detected Fault

Figure 4.3. Fault free district heat consumption for five consumers that can be seen

as candidates of a collective. The fault detection algorithm, withL = 50 and α = 0.05, detects faults at the season change between summer/autumn for consumer four.

If the fault detection algorithm with L = 50 and α = 0.05 is used, faults are detected for consumer four. The faults are registered during the change of summer/autumn. As seen in Figure 4.3, consumer four has a diﬀerent consumption pattern during the warmer period of the year. The consumption is suddenly lowered at the break of spring/summer and suddenly raised during the break of summer/autumn. Such an appearance in consumption is quite common for district heat consumers. If a window length of L = 50, or lower, is going to be

(37)

used, consumer four can not be a member of the collective, hence false alarms will occur. It is also possible to raise the significance value α with the risk of missing actual faults. Consumer four must therefore be a member of a collective with the same consumption pattern that suddenly changes the consumption during season breaks. The members of such a collective should have these changes in consumption at approximately same date. If it turns out to be difficult to find such collectives, one must accept false alarms during season changes.

If consumer four is excluded from the collective and L is reduced to 40, faults are detected for consumer three. This result is not surprising, since the method of checking the minimal column sum of the R-matrix, also indicated that consumer three was least correlated among the collective members. If L = 45 and consumer one is excluded from the collective, the remaining four consumers can constitute the collective.

The same procedure as above is used to ﬁnd a collective of electricity con-sumers. Figure B.1 show fault free electricity consumption for ﬁve consumers that are candidates of the collective. The correlation matrix, calculated with all observations, becomes R = ⎛ ⎜ ⎜ ⎜ ⎜ ⎝ 1.0000 0.7653 0.7108 0.7655 0.7665 0.7653 1.0000 0.8881 0.8851 0.9175 0.7108 0.8881 1.0000 0.8351 0.8947 0.7655 0.8851 0.8351 1.0000 0.8571 0.7665 0.9175 0.8947 0.8571 1.0000 ⎞ ⎟ ⎟ ⎟ ⎟ ⎠

As seen, the consumers in the electricity collective are not as correlated as the consumers in the district heat collective. The consumption pattern for electric-ity consumers are often more stochastic, since the electricelectric-ity consumption is also affected by social factors. If the consumer uses district heat for space heating, the stochastic behavior of the consumers’ electricity consumption is of extra im-portance. Even though it is more difficult to find electricity consumers that are correlated, it is most likely that there exist electricity consumers that can con-stitute a collective. Despite the low correlation, these consumers will be used to evaluate the model on the electricity collective.

The minimal column sum correspond to column one in the R-matrix. This indicates that consumer one is least correlated. The consumption pattern for this consumer show more season variations than the rest of the consumers. The fault detection algorithm, with L = 25 and α = 0.05, detects faults fore consumer one. Consumer one is therefore excluded from the collective. The four remaining consumers will constitute the collective.

4.2.2 Fault Detection

The fault detection algorithm, described in section 3.3.2, is used on the two col-lectives derived in previous section. Since no datasets with actual faults have been found for the consumers in the two collectives, faults are introduced to the measured consumption. The signiﬁcance level is set to α = 0.05 in all simulations.

(38)

26 Simulations and Results District Heat Collective

In Figure B.2, a fault that corresponds to a constant error of a factor ten in ME-TER BAS is introduced in consumers two. At time t = 400−401, the consumption is ten times as high as it should be. The fault detection algorithm, with L = 45, is able to detect the fault. If the same fault is introduced, but the duration is more than one sample, the fault is still detected. The diﬀerence is that the fault will not be detected L samples after the fault has occurred. This is illustrated in Figure B.3. The basic idea is that the fault detection algorithm is updated in real time. Hence the important fact is that a change in consumption is detected. Since the fault was detected at time t = 400, the system is informed that the consumption for consumer two is incorrect and the missed detections L samples after, is of little importance.

A fault that corresponds to a constant error in METER BAS, can also result in a consumption that is one tenth as high as it should be. If such a fault appears during the warmer period of the year, the fault will not be detected for the chosen district heat collective. The consumption for the consumers in the collective is too low to detect a fault of tenth the consumption. This is illustrated in Figure B.4, where such a fault is introduced to consumer two. If the fault occurs during the colder period of the year, when the consumption is higher, the fault will be detected. This is illustrated in Figure B.5.

In Figure B.6, zeros are introduced to consumer three at time t = 550 − 551. With a window length of L = 45, the fault is missed. The fault is diﬃcult to detect, since the consumption is almost zero even in the fault free case. If this kind of transient and small fault is going to be detected, a shorter window length must be used. A shorter window will thus result in false alarms for consumer three. If zeros are introduced at time t = 550 − 638, the fault is detected 20 days after it has occurred. This is illustrated in Figure B.7.

A slow change in consumption, corresponding to an incorrect meter, can be modeled with a linear trend of 10% increase per day. In Figure B.8, this fault is introduced to consumer three. At time t = 450, the consumption slowly starts to increase with 10% per day. With L = 45, the fault is detected at t = 470.

Electricity Collective

As described in section 2.5, the constants in METER IN that are used for electricity consumers, can be exchanged with the factors 10, 20, 30, 40, 60, 80, 100 and 120. The most diﬃcult case, is to detect small deviations in consumption. If constants 100 and 120 are mixed up, the consumption can be 120₁₀₀ or 100₁₂₀ times as the actual consumption. Such small deviations in consumption have not been detected in simulations for the electricity collective. Faults are detected ﬁrst when the consumption is a factor 4, or 1₄ of the actual consumption. A constant change that result in a less change in consumption, will not be detected for this collective. In Figure B.9, a constant error with a factor 4 is introduced in consumer 2 at time

t = 150. The fault is detected immediately. If the constant factor is set to 1₄, the consumption is detected at time t = 160. This is illustrated in Figure B.10. If the same fault is introduced to consumer three, the fault is almost missed. This can

(39)

be explained by the fact that consumer three is least correlated in the collective. When zeros are introduced, same results as for the district heat collective are achieved. If a zero last for just one sample, the fault is not detected. If the fault is durable, it will be detected. Simulations of these two examples can be found in Figure B.11 and Figure B.12.

The fault detection algorithm is not able to detect a linear trend of 10% increase per day on any of the consumers in the collective. The fault is detected when the linear trend has been increased to 50% per day. In Figure B.13, a linear trend of 50% per day has been added to consumer four. The fault is detected with a window length of L = 20. If the window length is decreased, the fault will be missed. A slow change in consumption should be easier to detect with a longer time window. No such results have been found for this collective. The fault is still detected with a window length of L = 100, but the fault is not detected as quickly as with the shorter time window. If a collective that is more correlated can be found, linear trends around 10% increase per day should be detected.

Summary

The fault detection algorithm has only been evaluated on the two collectives pre-sented above. Additional simulations must be done to ﬁnd out how and if it is possible to use this model approach. The main results from the performed simu-lations can be summarized as

• The ability to detect a fault is highly dependent on the correlation of the

collective and the choice of the window length L. As for the district heat collective, sudden changes in consumption during season changes, will be registered as a fault with a short time window, and neglected with a long time window. The collective must therefore be correlated during L samples.

• If a long window length is used, transient faults will be missed. On the other

hand, it should be easier to detect a slow change in consumption with a long window length. A method for ﬁnding the optimal window length has not been found.

• Zeros are diﬃcult to detect, especially for consumers with low consumption.

Zero consumption arises mostly if the AMR cease to function. It is not likely that connection fails for several hours or days. Since the day mean consumption is used in this model, a zero value for one or more hour will not have a large aﬀect of the day mean value. An alternative method could be used to detect zero consumption. Zeros could be detected with a separate system function before the statistical model is used.

• Not all faults that correspond to an incorrect constant in METER IN has

been detected. For the district heat collective, such faults are missed if they occur during the warmer period of the year. For the electricity collective, faults are detected ﬁrst when the consumption is a factor 4, or 1₄of the actual consumption.

(40)

• A slow increase in consumption of 10% has only been detected for the district

heat collective.

• How the number of consumers aﬀects the collective has not been investigated.

It is not likely that the same fault occurs at the same time. If this is the case, a collective with several consumers, that are well correlated, should be less sensitive towards multiple faults.

(41)

Chapter 5

Summary

5.1 Conclusions

In this thesis, two models that can be used to detect incorrect measurements have been derived. The ﬁrst model uses historical data of consumption and outdoor temperature to estimate model parameters. When the model parameters have been estimated, the consumption can be simulated by using the outdoor temperature. The simulated consumption can be compared to the actual consumption to detect deviations in consumption. The second model is a more statistical approach that compares the correlation between consumers that can be grouped into a collective. If measurements from a consumer in the collective are incorrect, this consumer will be less correlated towards the other consumers in the collective. By studying the correlation between the consumers in the collective, incorrect measurements can be found. The conclusions will be divided for each model.

5.1.1 Simulation Model

• The model is best suited for consumers whose consumption mostly is aﬀected

by the outdoor temperature. These consumers are district heat consumers, and consumers that use electricity for space heating. Consumption, not corresponding to a change in outdoor temperature, will be interpreted as a fault. Some other explanation to a change in consumption, other than a change in temperature, may exist. The number of sun hours and wind speed does most likely aﬀect the consumption.

• The model can be used both on consumers that have an increased

con-sumption during weekdays as well as those who have a similar concon-sumption throughout the week.

• A residual with a sliding window has been used to detect incorrect

mea-surements. A sudden change in consumption is easier to detect than a slow change. The choice of the window length is of importance. A long window

(42)

30 Summary

length makes it easier to detect a slow change in consumption, but transient faults can be missed and vice versa.

• When theoretical thresholds have been used, the thresholds are too low,

especially for large time windows. This can be explained by the fact that the simulation error is not suﬃciently normal distributed. A low threshold will result in false alarms. The generated false alarms, often occur when the consumption can be related to a small change in consumption, that can not be explained by a change in temperature. If these small deviations should not be registered as a fault, theoretical thresholds can not be used. Simulations have shown that a threshold based on experience can be used to reduce the number of false alarms, without missing the actual faults.

• To estimate the model parameters, correct measurements from one year must

be available. Since the measurements are not corrected, the measurement database contains incorrect data of the historical consumption. If this model is going to be used in practice, the incorrect measurements must be corrected.

5.1.2 Statistical Model

• The fault detection performance is highly dependent on ﬁnding a collective

that is well correlated. If these collectives can be found, the model can be used on district heat consumers as well as electricity consumers.

• Today, the consumers are divided into groups according to basic data from

Swedish Statistics. These groupings are too imprecise to serve as collectives. New reﬁned groupings must be done. The model itself can be used to ﬁnd consumers that can constitute a collective. In the fault free case, faults must not be detected. If a consumer does not belong to the collective, the model will interpret the consumers’ consumption as incorrect and a fault will be registered.

• All faults that can occur have not been detected. As for the district heat

collective, small deviations during the warmer periods of the year are diﬃcult to detect. For the electricity collective, faults were detected ﬁrst when the consumption was 4 or 1₄ times the actual consumption. A slow increase in consumption, corresponding to a trend, has not been detected for the electricity collective. If collectives that are more correlated are used, these faults should be detected.

• Zeros are diﬃcult to detect, especially for consumers with low consumption.

Zero consumption arises mostly if the AMR cease to function. It is not likely that connection fails for several hours or days. Since the day mean consumption is used in this model, a zero value for one or more hour will not have a large aﬀect of the day mean value. An alternative method could be used to detect zero consumption. Zeros could be detected with a separate system function before the statistical model is used.

(43)

5.2 Suggestions to Further Studies 31

5.2 Suggestions to Further Studies

The behavior of district heat and electricity consumption is a complex process. A lot of parameters can be varied in the derived models, and the parameters can vary for each consumer. Additional simulations and analyzes, especially for the statistical model, are needed before any of the models can be implemented in a system. Thus, the work in this thesis has left a foundation for further studies.

• Previous studies, [1], [6] and [5] have used the outdoor temperature, number

of sun hours and wind speed to create a cooling signal. It would be interesting to introduce these factors in the simulation model to see if this would make the simulation more normally distributed. If this is the case, the theoretical thresholds could be used.

• The optimal window length for the residual used to detect faults with the

simulation model has not been investigated. The window length is a balance between detecting transient faults and missing slow trends. One approach could be to use several residuals that are sensitive towards a certain fault. The optimal window length, used in the fault detection algorithm in the statistical model, must also be investigated further.

• The day mean value of the consumption has been used in all models. If the

hourly measurements are to be used explicitly, one must be able to measure the hourly outdoor temperature. It will also be more diﬃcult to ﬁnd corre-lation between the consumers’ hourly measurements, since the consumption pattern during a day vary for each consumer. If the day mean consumption is used, occasional incorrect measurements can be missed. This could be solved by checking the hourly measurements min/max values, prior the day mean value is calculated. For electricity consumers, the size of the fuse etc., could be used to calculate the maximal physical possible consumption.

• The statistical model has only been used on two collectives. To investigate

this model approach, several collectives, both for district heat and electricity consumers, must be derived.

• How the number of consumers aﬀects the collective has not been investigated.

It is not likely that the same fault occurs at the same time. If this is the case, a collective with several consumers, that are well correlated, should be less sensitive towards multiple faults.

(44)

Fault Detection of Hourly Measurements in District Heat and Electricity Consumption

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Fault Detection of Hourly Measurements in District

Heat and Electricity Consumption

Fault Detection of Hourly Measurements in District

Heat and Electricity Consumption

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Abstract

Acknowledgements

Contents

Chapter 1

Introduction

1.1

Background

1.2

Problem Description

1.3

Objectives

1.4

Delimitation

1.5

Thesis Outline

Chapter 2

System Overview

2.1

From Measurement Reading to Database

2.2

Debit System

2.3

Available Data

2.4

Data Quality

2.5

Faults to be Detected

Chapter 3

Modeling

3.1

District Heat Characteristics

3.2

Simulation Model

3.2.1

Model Approach

3.2.2

Parameter Estimation

3.2.3

Validation

3.2.4

Fault Detection

3.3

Statistical Model

3.3.1

Model Approach

3.3.2

Fault Detection

3.3.3

How to Choose a Collective

Chapter 4

Simulations and Results

4.1

Simulation Model

4.1.1

Simulation Prerequisites

4.1.2

Validation

4.1.3

Fault Detection

4.2

Statistical Model

4.2.1

Finding the Collective

4.2.2

Fault Detection

Chapter 5

Summary

5.1

Conclusions