Improving Cyber-Security of Power System State Estimators

(1)

Improving Cyber-Security

of Power System State Estimators

MARTINA GIANNINI

Master’s Degree Project

Stockholm, Sweden February 2014

(2)

(3)

Abstract

During the last century, technological advances have deeply renewed many critical infrastructures, such as transportation networks and power systems.

In fact, the strong interconnection between physical process, communication channels, and control systems have led to the new concept of cyber-physical systems.

Next to countless new advantages, these systems unfortunately have also new weaknesses. An example is cyber-attacks: malicious intrusions into the communication channel turned to manipulate data.

In this thesis the considered cyber-physical system is a power network where hundreds of field devices are connected to a control center, which collects data and controls the whole system.

A cyber-attack where the adversary model is based on the attacker’s knowledge of the network topology and line parameters is considered.

This work is focused on one of the features of the control center: the state estimator. After a preliminary analysis of the conventional state estimators with respect to cyber attacks constructed according to this adversary model, new ideas for improving the security of the system are presented.

The aim of this thesis is to propose novel state estimators that are both accurate under no cyber-attack, and at the same time able to detect attacks that are undetectable by the conventional state estimator.

This mainly involves introducing additional information about the system as constraints in the state estimator, under the assumption that the new information is not available to the adversary.

At the end of the analysis of the new mathematical model of the state estimators, a new definition of undetectable attack is proposed. The functionality of the novel state estimators is demonstrated in numerical experiments, which have been performed on different benchmark power networks.

(4)

(5)

List of Figures

1.1 Network Organization . . . 9

1.2 Device hierarchy . . . 10

2.1 Example of network . . . 15

2.2 Power measurements . . . 26

2.3 Example of anomaly . . . 28

2.4 Residuals . . . 29

2.5 Probability False Alarm . . . 30

3.1 Example of attack . . . 35

3.2 Risk analysis . . . 41

4.1 OPF Control loop . . . 48

4.2 Example with new formulation . . . 52

4.3 First measurement attack with model 1 . . . 54

4.4 Second measurement attack with model 1 . . . 55

4.5 Third measurement attack with model 1 . . . 56

4.6 Fourth measurement attack with model 1 . . . 57

4.7 Fifth measurement attack with model 1 . . . 57

4.8 Residual 2-norm of model 1 . . . 58

(8)

5.1 Simulation model 1 large network . . . 81

(9)

List of Tables

3.1 Index α . . . 41

5.1 Risk analysis: the index α . . . 80

6.1 Results of model 1: first measurement attack . . . 92

6.2 Results of model 1: second measurement attack . . . 93

6.3 Results of model 1: third measurement attack . . . 94

6.4 Results of model 1: fourth measurement attack . . . 95

6.5 Results of model 1: fifth measurement attack . . . 96

6.16 Simulation of the first state estimator: large network . . . 107

6.17 Simulation of the second state estimator: large network . . . 108

6.18 Simulation of the third state estimator: large network . . . 109

(10)

(11)

Chapter 1 Introduction

1.1 Power systems

A power network is an interconnected network for delivering electricity from suppliers to consumers. An example of interconnected power network is showed in Figure 1.1.

The history of these systems began with the discovery of the fundamental principles of electricity generation during the 1820s and early 1830s by the British scientist Michael Faraday. Together with the contribution of scientists like Thomas Edison and Nikola Tesla, the technological revolution in energy generation led the electric power to become the engine of the modern world:

this phenomenon is called Electrification [3].

Figure 1.1: Example of organization of a modern Power Network.

(12)

Entire continents, like Europe, are completely connected by these complex networks. They bring electricity to every building ensuring the well-being of billions of people. Therefore it is crucial that the power network must oper- ate safely and reliably at all times.

In such systems there is a strong interconnection between all the elements involved in the process of generating, transmitting, and distributing electrical energy. A modern network is composed of different types of devices, each of them is designed for a specific task.

The whole physical process is monitored and controlled by a control system.

The role of the control system is to check that the system is satisfying the operational goals and to ensure the reliability of the power network performance. This is obtained by computing control actions that may correct critical situations. The communication infrastructure used to collect measurements and transmit control actions to the actuators of these systems is called SCADA, Supervisory Control and Data Acquisition.

There are different layers of control in power systems, such as physical level, communication level and application level, as shown in Figure 1.2. The SCADA controls at a high level, for example by providing set points to lower-level power plant controllers.

Figure 1.2: Hierarchy between devices.

The control system is composed of a set of networked devices, consisting of sensors, actuators, control processing units and communication devices, all organized in a hierarchical structure [2], as depicted in Figure 1.2.

The devices involved in the control system’s processes are:

• SCADA system:

it is an infrastructure, supporting the supervisory control level. It is multi-tasking and based upon a real-time database located in one or more servers [4].

• RTUs (Remote Terminal Units) and substation:

RTUs are field devices which collect data acquired by the sensors and

(13)

1.1. POWER SYSTEMS send them to the control center. A substation can be regarded as a point of electrical connection where the transmission lines, transform- ers, generating units, system monitoring and control equipment are connected together [10].

• Traditional sensors and PMUs (Phasor Measurement Units) :

instruments embedded in the environment, in contact with the physical process. The main difference between these devices is that PMUs are recent and rely on GPS signals to measure phase-angles, voltages, and frequency at a high sampling rate [19]. The conventional sensors, which do not have access to a global time clock such as GPS. These sensors measure voltages, currents, and power, but at lower sampling rates.

They can not measure phase-angles directly. The metered quantities characterize the system, as shown in Section 2.1.

• Actuators: PLCs (Programmable Logic Controller):

they apply instructions coming from the control center, operating changes to the physical infrastructures.

The common services provided by the SCADA system are data acquisition and control, storage of information, execution of dedicated applications, providing user interface and communication gateways [5].

Specific features of the control center for a power network are:

• Energy management system is a software package designed to improve operation of power generation and transmission. Its main objective is to assist operators to supply of power demand in a reliable and cost- effective way [5].

• Automatic generation control is concerned with fine tuning the system frequency. It relies on tie-lines and frequency measurements [19].

• Optimal power flow analysis, which provides optimal set-points of the generated power, with respect to the costs of generation and transmission, presented in Section 4.2.

• Contingency analysis determines whether steady-state operating limits would be violated by the occurrence of credible contingencies [17].

• State estimator and bad data detector provide estimation of the measurements of the network, identify and remove errors, due for example to sensor inaccuracy 2.1.

(14)

These systems rely on computer and communication networks to manage power generation and facilitate communications between users, suppliers and operators. The connections used by such systems are heterogeneous: it can be composed of fibre optics, satellite, microwave connection, but also office LANs that makes the systems accessible from the Internet.

As data must travel along different channels, they are often sent without encryption, particularly data from legacy equipment. This is also due to the geographically configuration of the system: encryption key management activities have still difficulties within a wide-area disperse network [19].

In the imminent future the next step in power network sector will be smart grids: they will improve the current grid’s capabilities by increasing the number of measurements, increasing the interconnection between producers and customers and integrating the end users in the power grid operations. More- over they better utilize the fluctuating power supply from renewable energy sources such as photovoltaic and wind power. Due to the increased depen- dency on cyber resources of these new technologies, the network has become more vulnerable to cyber-attacks. For instance, one such cyber-attack may be performed by an adversary that gained access to the communication network and is able to corrupt the communicated data.

Next to conventional requirements, such as power quality, cyber-security is becoming one of the most important research topics.

The risk management framework has to be introduced, [12], [19], [20], to provide a vulnerability analysis and to tackle cyber attacks. It is based on two phases: risk analysis and risk mitigation. Risk analysis helps to identify vulnerabilities, to determine possible impacts to the applications and to the physical systems. Risk mitigation involves in the deployment of more robust supporting infrastructure, power applications, and possible counter- measures [19].

1.2 Problem formulation and contribution

The conventional features of the control center, such as state estimator and bad data detector, are not always resilient with respect to cyber attacks di- rected to the measurements. In the literature, the problem of power network cyber vulnerabilities is well known and in many papers it is shown how an attack may not be detected by the control center, for example in [16], [6], [9].

In the considered scenario the cyber vulnerabilities are the ability of the attacker to corrupt any measurements without triggering alarms, assuming that the adversary has complete knowledge of the topology and parameter of the electrical network.

(15)

1.3. THESIS ORGANIZATION Risk analysis characterizes the attacks that are able to bypass the bad data detector and identifies the vulnerability of each measurement to such attacks.

Given the first information from risk analysis, the main aim of this thesis is to improve the risk mitigation at the application layer against adversaries that only known the topological properties of the grid. The proposed solution consists of an extension of the conventional state estimator.

The novel state estimator requires more model knowledge regarding the power system. This is achieved by considering information about the network that is supposed to be already available to the operators.

The main idea is that this enhanced knowledge imposes additional constraints on the state estimator. To bypass the improved model of state estimation, the attacker must also satisfy these additional constraints, which requires model knowledge that is not immediately available to the attacker. There- fore this new information introduced into the model restrict the attacks that may bypass the traditional state estimator.

Although the novel state estimator may be effective against adversaries know- ing only the topology of the grid, its vulnerability against omniscient adversaries is characterized with respect to the novel states estimators and the results are compared to the vulnerabilities of the conventional method.

1.3 Thesis organization

In Chapter 2 the model of the power network, the state estimation procedure and the bad data detection system are introduced. State estimation and bad data detection are features of the control center which can detect error or malicious data.

In Chapter 3 the structure of a particular class of cyber-attacks is presented.

In particular, risk analysis for omniscient adversary will be performed, which computes the vulnerability of each measurement to undetectable attacks. It will be shown how such attacks can bypass the bad data detection system of the control center.

In Chapter 4 the introduction of new information to the state estimation is studied. The analysis will begin from the simplest situation where information consists in accurate measurements coming from trustworthy sources, which are used as constraints. Then the analysis will arrive to the case where measurements are both secure and accurate, but other additional models are introduced: power balance equation, the control scheme, the relation between the state estimator and the trustworthy sources of the new data.

(16)

In Chapter 5 a large case study with the results of the simulations is presented.

In Chapter 6 the conclusion of this thesis is drawn. It mainly involves in a final overview on the work.

(17)

Chapter 2 Modeling

2.1 Network model

The physical model of power networks is based on analog and logical data.

Analog data are the available measurements of the electrical power, voltage, and current. Logical data are the statuses of switching devices.

Other useful information is static parameters. These ones characterize the branches that connect buses to each other: resistance R, reactance X, con- ductance G , susceptance B, impedance Z and length l.

The analog measurements involved will be active and reactive power flows, active power injections, power loads and voltages [13].

A power grid is composed of buses, nodes of the network, interconnected to each other [13]. A simple example of topology of a power network is shown in Figure 2.1:

Figure 2.1: Example of network with 4 buses, each one identified by an index

(18)

There are four variables associated to each bus:

• V_i: voltage magnitude;

• δ_i: voltage angle;

• P_i: net active injection power (sum of generation and load);

• Q_i: net reactive injection power (sum of generation and load).

Depending on which of these variables are given and which ones are to be calculated, two types of buses can be defined:

• P Q-bus: P_i and Q_i are known δ_i and V_i are unknown.

• P V -bus: P_i and V_i are known δ_i and Q_i are unknown.

• V δ bus: δ_i and V_i are known P_i and Q_i are unknown

P Q-bus is used to represent load buses without voltage control, P V -bus to represent generation buses with voltage control, V δ-bus, called reference bus or slack bus, serves the voltage angle reference and its active power is used to balance generation, load and losses.

2.1.1 Network equations

All electrical networks consist of interconnected transmission lines and trans- formers: for any node i, the current injection at the node is

I_i= Y_iiV_i+

nbus

X

k=1,k6=i

Y_ikV_k (2.1)

Where nbus is the number of nodes. These individual AC-models are com- bined [10] to represent the whole network by forming the nodal network equations.





 I₁

... Ii

... I_nbus







=







Y₁₁ . . . Y_1i . . . Y_1nbus ... . .. ... . .. ... Yi1 . . . Yii . . . Yinbus

... . .. ... . .. ... Y_nbus1 . . . Y_nbusi . . . Y_nbusnbus











 V₁

... Vi

... V_nbus







(2.2)

(19)

2.1. NETWORK MODEL The indices i, k stand for node numbers so that V_i ∈ C is the voltage phasor at node i, V_i = V_ie^−jδⁱ, I_i ∈ C is the current phasor at node i, I_i = I_ie^−jβⁱ, and Y_i = Y_ike^−jθ^ij ∈ C is the mutual admittance between nodes i and nodes k. The matrix Y is called the nodal admittance matrix. Its inverse Z = Y⁻¹ is called nodal impedance matrix and it can be used to rewrite the previous relation as:





 V₁

... Vi

... V_nbus







=







Z₁₁ . . . Z_1i . . . Z_1nbus ... . .. ... . .. ... Zi1 . . . Zii . . . Zinbus

... . .. ... . .. ... Z_nbus1 . . . Z_nbusi . . . Z_nbusnbus











 I₁

... Ii

... I_nbus







(2.3)

The complex power injected at any node i is:

S_i = V_iI_i = P_i + jQ_i = V_i²Y_iie^−jθⁱⁱ+ V_i

nbus

X

k=1,k6=i

V_kY_ike^j(δⁱ^−δ^k^−θ^ik⁾ (2.4)

Separating the real and imaginary parts:

P_i = V_i²Y_iicos θ_ii+

nbus

X

k=1,k6=i

V_iV_kY_ikcos (δ_i− δ_k− θ_ik) (2.5)

Q_i = −V_i²Y_iisin θ_ii+

nbus

X

k=1,k6=i

V_iV_kY_iksin (δ_i− δ_k− θ_ik) (2.6)

2.1.2 Linearized model

The real and reactive power injection at each node is a nonlinear function of the system voltages, that is P = P (V, δ) and Q = Q(V, δ). It is often convenient and necessary to linearize these functions in the vicinity of the operating point, which describes the present state of the system and satisfies the equations of the network (2.5), (2.6). This linearization is carried out using a first-order Taylor expansion [10]. It can be written, using matrix algebra and hybrid network equations, as:

∆P

∆Q

= H M

N K

∆δ

∆V

(2.7)

where ∆P is the vector of the real power changes at all the system nodes,

(20)

∆Q is the vector of reactive power changes, ∆V is the vector of voltage magnitude increments and ∆δ is the vector of voltage angle increments.

The elements of the Jacobian sub-matrices H, M, N, K are the partial deriva- tives of the functions of active and reactive power, that is:

H_i,k = ∂P_i

∂δ_k, M_ik = ∂P_i

∂V_k, N_ik = ∂Q_i

∂δ_k, K_ik = ∂Q_i

∂V_k (2.8)

2.1.3 DC network model

Real and reactive power flows are connected in AC transmission networks.

Real power is mainly affected by the load angle while reactive power flow is mainly affected by the voltage magnitudes. As reactive power cannot be transmitted over long distances, its control is a local problem: this function is exploited by local voltage regulators [10].

Control of real power flows is executed using the remaining two quantities, load angle and series reactance X. The lossless DC-model of the power system that has reached the steady stade is considered in this thesis. The DC-model (Direct Current model) is obtained by the linearization of the AC- model, considering the phase angle at the operating point is equal to zero, the resistances R of all the branches are neglected and the angle difference between two nodes is very small ( sin(δ_i− δ_k) ∼ δ_i− δ_k ).

The linear dependence between power and voltage angles is obtained from relation (2.7). To simplify the notation, the variables ∆P and ∆δ will be denoted by P and δ, respectively:

P = Hδ (2.9)

A bus is represented as a node of the network. If k is a node adjacent to i and N_i is the set of nodes adjacent to i, the power injection at bus i is:

P_i = X

k∈Ni

P_ik (2.10)

and, assuming that the resistance in the transmission line is small compared to its reactance, the power flow from bus i to bus k is:

P_ik = V_iV_k

X_ik (δ_i− δ_k) (2.11)

(21)

2.2. STATE ESTIMATION

2.2 State estimation

The state estimator is a data-processing method that provides a represen- tation of the current conditions in a power networks from different information, such as: measurements of system variables, mathematical model of the system, instrumentation and prior knowledge of various system inputs and outputs, such as pseudomeasurements, which are based on load prediction and generation scheduling, of parts of the network not directly metered, but which have to satisfy physical laws [13].

The observability of the network is related to number of measurements and their distribution throughout the network. A contiguous part of the network is called observable island if it is possible to calculate all the branch flows from the available measurements [13].

With adequate redundancy level, state estimation can eliminate the effect of some bad data and allow for the temporary loss of measurements without affecting the estimated values.

Bad data consists in errors, noises and incongruent quantities which are due to instrumentation and communication inaccuracy. State estimators are used to filter redundant data, to eliminate incorrect measurements, to produce reliable state estimates and to determine the power flows in parts of the network that are not directly metered.

The output of the state estimator approximates the true state of the system.

Errors are due to noise in instruments and communication channels, incomplete knowledge, delayed measurements, erroneous pseudomeasurements and inaccurate network parameters.

Hence the optimal estimator not only generates a state estimate but also the

‘error’ associated with it. This error is analyzed by the bad data detection system.

2.2.1 State variables and measurements

State variables are defined as a non-redundant set of variables that entirely characterize the system. According to the DC-model of power networks, the state variables are phase angles, δ.

The set of state variables is denoted with x and the number of state variables as n. The quantities that can be measured are considered known functions of the state variables. They are called measured variables.

In practical state estimators many measurements are required, such as voltage magnitude, active power (branch flow, branch-group flows, bus injections and pseudomeasurement), reactive power (branch flow, branch-group flows,

(22)

bus injections and pseudomeasurement), current magnitude flow and current injection.

The set of measured variables is denoted as z, the number of measurements as m. Each measurement is corrupted by a noise, assumed to be Gaussian, with zero mean and known covariance matrix, R. It is a diagonal matrix with diagonal terms:

R_ii = σ_i²

where σ_i is the RMS value, standard deviation of noise on the ith measurement. These values characterize the accuracy of the individual measurements and are fundamental in computing an optimum estimate.

2.2.2 State estimation formulation

Given a set of power flow measurements, the nonlinear model is [13]:

z = h(x) + e (2.12)

where z ∈ R^m is the vector of power flow measurements, x ∈ Rⁿ is the state vector, h(·) is a nonlinear scalar function relating the measurements to states, and e ∈ R^m is the measurement noise vector, which is assumed to have zero mean and covariance R .

There are m measurements, corresponding to the number of sensors in the network, and n state variables, corresponding to the number of buses in the network. Usually n < m, thus it is an overdetermined system.

The primary output of an optimal estimator is a value of the state variables called the optimum state estimate and denoted as ˆx . It minimizes some criterion, based on knowledge of the equation relating the measured variables to the state variables, the statistics of the measurement noise and a given set of measurements z.

There are several methods useful to define these minimization criteria [13]:

• Weighted least squares - WLS,

• Maximum likelihood criterion,

• Minimum variance criterion,

• Least Absolute Value Estimator,

• Leverage Points,

(23)

2.2. STATE ESTIMATION

• Least Median of Squares Estimators,

• Dynamic Estimation.

In particular the WLS criterion is a quadratic function of the residual. The residual is the difference between the measurements actually made and the measurements that would have been made if the estimate were exact and there were no disturbances.

2.2.3 Weighted Least Squares state estimation

The first clear and concise exposition of the method of least squares was published by Legendre in 1805. The technique is described as an algebraic procedure for fitting linear equations to data and Legendre demonstrates the new method by analyzing the same data.

The goal of the least squares theory is resolving the problem of data fitting.

The weighted least squares method is a special case of this theory and occurs when all the off-diagonals entries of the correlation matrix of the residual are zero. It is based on the implicit assumption that the errors are uncorrelated with each other and with the independent variables x and have equal variance.

It is the case when fitting data that contains random variations, and there are two important assumptions that are usually made about the error [11]:

• The error exists only in the response data, and not in the predictor data.

• The errors are random and follow a normal (Gaussian) distribution with zero mean and constant covariance, R.

The second assumption is often expressed as error e ∼ N (0, R).

To improve the fit, an additional scale factor (the weight) is included in the fitting process. The weights determine how much each response value influences the final parameter estimates. A high-quality data point influences the fit more than a low-quality data point. The weights should transform the response variances to a constant value. If the variance of the measurement error is known, then the weights are given by:

w_i = 1

σ² (2.13)

(24)

Given a set of known data, which is the set of measurements z, expressed as functions of the state variables of the system, x, the objective of the WLS formulation is:

minx J (x) = 1 2

m

X

i=1

r_i²

σ_i² (2.14)

where

r_i = z_i− h_i(x)

is the residual, that is the difference between the observed value and the fitted value provided by the model, considering error equal to zero. The optimal value of the state variables, ˆx, is found by the minimization of the objective function.

Power measurements and state variables, namely the phase angles, are related by a nonlinear function, thus a used approach is approximate the model by a linear function and to refine the parameters by successive iteration. Using the Gauss-Newton method, the relation between z and x becomes:

z = Hx + e (2.15)

where

H = ∂h

∂x(ˆx) (2.16)

is the constant Jacobian matrix m by n dimension of h(x). It represents the DC-model of the network: it describes a linear relation between the power measurements z and the state variables, phase angles, δ, as in equation (2.9).

A necessary assumption is that the differences between state variables are small, hence the linear approximation is accurate. Furthermore it is required that the model is overdetermined, to avoid singularities.

The minimization problem comes to be:

minx J (x) = 1 2

m

X

i=1

r²_i

σ²_i = (z − Hx)⁰R⁻¹(z − Hx) (2.17) And the optimal value of x is:

ˆ

x = (H⁰R⁻¹H)⁻¹H⁰R⁻¹z (2.18) it is obtained by equating the gradient of J (x) to zero. This is also the generic solution for the overdetermined case of state estimation. The optimum estimator is also used to generate an improved estimate of the measured variables themselves. This measurement estimation is denoted as ˆz:

ˆ

z = H ˆx = H((H⁰R⁻¹H)⁻¹H⁰R⁻¹z) = Kz (2.19)

(25)

2.3. BAD DATA DETECTION This means that the phase angle estimate is used to estimate the measurement. K is called hat matrix, the key to the analysis of bad data.

2.3 Bad data detection

The optimal estimator not only generates a state estimate but also the covariance or “error” associated with it [8].

E{(x − ˆx)⁰(x − ˆx)} = P_x_ˆ

The covariance matrix predicts the magnitude of the estimation error and hence provides a measure of confidence and accuracy of the estimate. An- other important quantity is the covariance of the estimated measurements

E{(z − ˆz)⁰(z − ˆz)} = P_z_ˆ useful in many application.

The predicted and the actual estimation errors should be of the same order of magnitude. Significant differences indicate that a major change, such as a fault, has occurred in the system or in the instrumentation, and an alarm signal can be given. Furthermore, the most likely of these major changes (or faults) can be determined.

The analysis of these differences is carried out by the bad data detection systems: the goal is to check that the measurements well fit to the physical model of the network.

It could happen that errors affecting state estimates are not compatible with their standard deviations. It is due to the presence of bad data among the measured quantities.

Bad data detection provides the function of detecting, identifying and elim- inating measurement errors throughout the system. It is based on testing hypothesis on two quantities associated to the object function: the largest normalized residual and the J (ˆx) performance index.

2.3.1 Largest normalized residual

The normalized residual r_jⁿ is defined for zj as the ratio of the residual estimate to the standard deviation of the residual [13]:

r_jⁿ= rˆ_j

ρ_jj = z_j − ˆz_j

ρ_jj (2.20)

(26)

Assuming that all the measurement, except z_j are good data, the largest normalized residual is the one calculated for z_j. Thus, the wrong measurement is responsible for the largest normalized residual.

The relation between the residual and the measurement error can be easily computed:

ˆ

r = z − ˆz = z − Kz = (I − K)z = (I − K)e = Se (2.21) where S is the so-called sensitivity matrix, that describes the relation between the residual and the measurement error.

The residual is normally distributed with mean zero and variance ρ²_jj = R_r_ˆ(j, j), diagonal element of the covariance matrix of the residuals.

Then the residual is so that ˆr ∼ N (0, R_r_ˆ).

The marginal distribution of the normalized residual r_jⁿ = ρ⁻¹_jj ˆr_j is normal with mean zero and unit variance, that is rⁿ∼ N (0, 1).

2.3.2 The J (ˆ x) performance index

J (ˆx) is function of ˆr and, by substitution, of ˆrⁿ

minx J (x) = 1 2

m

X

i=1

r²_i σ²_i = 1

2

m

X

i=1

r_i σi

2

= 1 2

m

X

i=1

rⁿ_iρ_ii σi

2

(2.22)

recalling that r_iⁿare m normal distributed variables with zero mean and unit variance, and J is the square of these variables, is it possible to say that J (x) follows a chi-square distribution with m − n degrees of freedom χ²_(m−n) and has mean m − n and variance 2(m − n) [13].

(27)

2.3. BAD DATA DETECTION

2.3.3 Bad data detection tests

Once these two quantities are defined, the first step of bad data analysis is to detect the presence of bad data. This is achieved by Chi square test and by the J (ˆx) test [13], which provide the detection of the presence of bad data.

Considering a null hypothesis H₀ and an alternative hypothesis H₁ : 1. H₀ : E{J(ˆx)} = m − n;

2. H₁ : E{J(ˆx)} > m − n.

Or alternatively,

1. if J (ˆx) > C, reject H₀; 2. if J (ˆx) ≤ C, accept H₀.

where C is a constant threshold depending on α, the significance level of the test, representing the probability of false alarm. C is the critical value of the residual J (ˆx): if α increases then the tolerance level C of the bad data detection, decreases, according to Figure 2.5.

C = χ²_m−n,1−α (2.23)

This means that if H₀ is true, that is there is no bad data, the probability of J (ˆx) > C is α. If bad data has been detected, it is necessary to identify where it is, and to eliminate it.

The largest normalized residual test, as in the previous test, considers two hypotheses:

1. H₀ : E{rjⁿ} = 0;

2. H₁ : E{rjⁿ} > 0.

Another formulation is:

1. if |rⁿ_j| > C, reject H0; 2. if |rⁿ_j| ≤ C, accept H₀.

where C is a constant to be determined. Defined α, the significance level of the test

C = K^α

2

This means that α is the probability that |rⁿ_j| ≤ C when there is no bad data. If bad data is identified, the corresponding measurement is removed and the state estimation is run again without the bad data.

(28)

2.4 Example of anomaly

Figure 2.2: Symple network with 4 buses.

Considering the simple network proposed in Figure 2.2, it will be shown how state estimation and bad data detection work. Let P_ij be the active power flow, from bus i to bus j, and P_i be either a power injection or a power load. Given a set of measurement z affected by error e, for example:





 z₁ z₂ z₃ z₄ z₅







=





 P₁ P₁₂ P₂₁ P₂₄ P₁₃





 +





 e₁ e₂ e₃ e₄ e₅







thanks to state estimation system it is possible to compute flows which are not directly metered.

Considering the numerical values of case4gs.m provided by MATPOWER [23], the vector of measurements z is





 z₁ z₂ z₃ z₄ z₅







=







1, 3253 0, 3641

−0, 3608

−1, 3273 0, 9314







(29)

2.4. EXAMPLE OF ANOMALY

Then the estimated measurements, obtained by (2.19) are





 ˆ z₁ ˆ z₂ ˆ z3

ˆ z₄ ˆ z₅







=







1, 3127 0, 3687

−0, 3687

−1, 3273 0, 9440







(2.24)

The difference between real measurements and estimated measurements is the residual. This quantity is computed and analyzed in the bad data detection system. In this case, where there are no errors or anomalies in the network, the residual is given by

r =





 z₁ z2

z₃ z₄ z5







−





 ˆ z₁ ˆ z2

ˆ z₃ ˆ z₄ ˆ z5







= (2.25)

=







1, 3253 0, 3641

−0, 3608

−1, 3273 0, 9314







−







1, 3127 0, 3687

−0, 3687

−1, 3273 0, 9440







=







0, 0126

−0, 0046 0, 0079 0

−0, 0126







It is always different from zero, because of measurement error e. There is a tolerance level that takes into account this expected inaccuracy. Bad Data Detection tests, such as LNR and Chi-Square described in Section 2.3.3, are designed to avoid false alarm when this error is restrained.

Considering the same network when an anomaly a occurs, such as in Fig- ure 2.3. The anomaly may represent a damage on the communication channel, a noise due electromagnetic interference, an overload: a disturbance in the network that occurs randomly.

The anomaly a can be represented as a vector:





 0 a 0 0 0







=





 0 2 0 0 0







(2.26)

(30)

Figure 2.3: Example of anomaly that occours in the line 1, 2 .

The vector of measurements comes to be:





 z_1an z_2an z_3an z_4an z_5an







=





 P₁ P₁₂ P₂₁ P₂₄ P₁₃





 +





 e₁ e₂ e₃ e₄ e₅





 +





 0 a 0 0 0







=







1, 3253 0, 3641

−0, 3608

−1, 3273 0, 9314





 +





 0 2 0 0 0







=







1, 3253 2, 3641

−0, 3608

−1, 3273 0, 9314







The estimated measurements provided by the state estimation model (2.19) are :





 ˆ z1an

ˆ z_2an ˆ z_3an ˆ z4an

ˆ z_5an







=







1, 7127 1, 1687

−1, 1687

−1, 3273 0, 5440





 And the new residual:

r_an =





 z_1an z2an

z_3an z_4an z5an







−





 ˆ z_1an ˆ z2an

ˆ z_3an ˆ z_4an ˆ z5an







=







1, 3253 2, 3641

−0, 3608

−1, 3273 0, 9314







−







1, 7127 1, 1687

−1, 1687

−1, 3273 0, 5440







=







−0, 3874 1, 1954 0, 8079 0 0, 3874







(31)

2.4. EXAMPLE OF ANOMALY

Figure 2.4: Results of state estimation.

Comparing the residual in these two faulty situations it is possible to see in Figure 2.4 that when an anomaly occurs the residual is larger than in the case when no anomaly occurs.

It exceeds the tolerance level of bad data detection system, thus an alarm is triggered and the anomaly is detected, identified and removed.

The tolerance level is defined by the constant C, so by the percentage α, as in equation (2.23): it represents the probability of a false alarm. The greater α is, the greater is the probability that even a simple noise error, due to instruments, is detected as bad data and the respective measurement rejected.

Otherwise, the smaller α is, the smaller the probability that an anomaly, like the one showed in the example, is detected.

As shown in Figure 2.5, if α increases then the area where the hypothesis test H₀, when no bad data occurs, decreases. Then the absolute value of the threshold C, the tolerance level of the bad data detection, decreases.

(32)

Figure 2.5: Probability distribution function of |r_jⁿ| ≤ C. The total area of under the function is 1. The gray area represents α, false alarm probability.

(33)

Chapter 3 Cyber security

3.1 Cyber security on power networks

The new technologies in power networks introduced an increasing depen- dency on cyber resources. This makes the system sensible to attacks.

The target of these cyber attacks may be to damage, steal, alter, or destroy specified information by hacking into a susceptible communication system.

Besides the physical impacts these malicious attacks may provoke, they can also cause economic losses.

Between all of the possible consequences of these cyber attacks, one is to cause significant errors in state estimation algorithms. This could mislead the control center to take wrong decisions.

These motivations raised the necessity of cyber security, based on analysis and mitigation of risks [19].

After the definition of undetectable attack, it will be shown how risk analysis can be carried out, focusing on the state estimation algorithm.

The definition of undetectable attacks will be used to characterize the model information an adversary must have to bypass the conventional state estimator and bad data detector. Later, this characterization will be leveraged to proposed novel state estimators that use additional information about the system. The risk analysis will be used to assess the vulnerability of each measurement to such adversary. Later, this will be compared to the risk analysis of the novel state estimators.

(34)

3.2 False data injection attacks

The most analyzed type of attack in the literature is false data injection [9]:

if an adversary gains access to measurement he can modify this information.

By the attacker’s point of view, achieving access involves having partial or complete knowledge of the system’s configuration and of the state estimation and bad data detection methods. He also has to break into the support infrastructure, that can have some vulnerabilities related to the managing software, to the configuration of authentication system and to the design of the network.

A first classification of attacks can be [7]:

• Strong attack regime: the attacker has access to all the measurements collected by the sensors. He can corrupt all, or some, of them. This means he has a complete knowledge of the network topology.

• Weak attack regime: the attacker has access to a limited number of measurements. This means he has an incomplete knowledge of the network topology.

The considered situation is that an attacker gains access to the measurements and is able to change some, or all, of the measurements from z to z_a = z + a.

The attack vector a is the corruption added to the real measurements. It al- lows the attacker to reach the goal to change a chosen meter zk to z_k,a = z_k + a_k, for some k and a fixed scalar a_k, the quantity that it is supposed to be added to the measurement.

A stealthy attack does not trigger the bad data detection system. Under no attack the bad data detector triggers alarms only when the measurement deviate too much from a possible physical state, at least as long as the linear model is valid. Intuitively a power flow measurement that requires more and larger corruptions to be altered in stealth is considered more secure [16].

Here some important definitions and theorems useful to characterize an attack are given [6]. Recalling that the network model is

z = Hx + e (3.1)

where z ∈ R^m is the vector of power flow measurements, x ∈ Rⁿ is the state vector, H is the jacobian matrix relating the measurements to states, and e ∈ R^mis the measurement noise vector, which is assumed to have zero mean and variance R. Thus there are n + 1 buses, m real power meters, n phase

(35)

3.2. FALSE DATA INJECTION ATTACKS angles to estimate. One bus is the reference bus and its phase angle is set equal to zero.

Definition 1. An attack A = (S, a) is a set of |S| compromised meters. The non-zero components of the attack vector a correspond to the compromised meters S. The sparsity of A is the number of compromised meters |S|. Un- der the attack A, the meter readings are changed by the attacker from their uncompromised values z to their post attack values z + a.

Definition 2. An attack A = (S, a) is called unobservable with respect to the power network model if there exists some system state consistent with the compromised observations. It means that the attack A = (S, a) is unobservable if and only if a = Hx_a is solvable, where x_a is the perceived change of system state necessary to produce the compromised measurements.

If an attack A is unobservable, it corresponds to a perceived perturbation in power flow. This perturbation consists entirely of power flows on lines connecting two distinct observable part of the network.

Definition 3. An attack A = (S, a) is called irreducible if there is no unobservable attack A⁰ = (S⁰, a⁰) with S⁰ ( S

Before the set of unobservable attacks, a = Hc has been characterized 2.

So from now on, let suppose that the adversary only knows H, and aims at injecting false data in the system while remaining stealthy. Moreover, the adversary wants to minimize the number of corrupted measurements, since less resources are required in this way.

An attack that affects a given measurement k can be mathematically constructed by the analysis of the matrix H [16].

• A first formulation is given by the 2-norm of the attack vector.

The stealthy attack vector a of minimal 2-norm kak2 = √

a⁰a that achieves z_k,a = z_k+ a_k is given by:

a = a_k

K_k,kK_.,k (3.2)

where K_.,k is the k-th column of the hat matrix, using R = I.

This construction of attack vector is not sparse, so it is not the optimal solution by the point of view of the attacker.

• A second solution is obtained by observations on the residual computed in the bad data detection system:

ˆ

r = z − ˆz = z − Kz = (I − K)z

(36)

If the attacker manipulates the measurement from z to z_a = z + a, with attack a = Hc ∈ R(H), where c is an arbitrary vector, by definition, it is undetectable (since a is unobservable, Definition 2). It means practically that the residual is not affected because z_a corresponds to an actual physical state in the power networks. Furthermore it means also that a lies in the null space of (I − K). Consider the following problem:

minc kHck₀ (3.3)

s.t.X

i

H_kic_i = 1

where kHck₀ denotes the number of non-zero elements in the vector a = Hc, and Hki is the element (k, i) of H.

The constraintP

iH_kici = 1 means that the k-th element of the attack vector given by a = Hc must be equal to 1, because it is the one that has to be corrupted.

Solving this problem means optimize overall corruptions a = Hc ∈ R(H), that do not trigger bad data alarms. A solution c^∗ can be rescaled to obtain a^∗ = akHc^∗ such that the measurement attack z_a= z + a achieves the attacker’s goal z_k,a = z_k+ a_k, corrupting as few measurements as possible. The cardinality of the set of compromised meters S will be |S| = ka^∗k0. This problem is non convex and it is hard to solve for large models, but [18] is addressed to this issue.

The solution of the problem (3.3) is an attack vector unobservable and irreducible. Solving this problem means minimizing the number of non-zero elements of the attack vector. So it corrupts as few measurements as possible, according to Definition 3. Another feature is the small magnitude: this, together with a good sparsity, gives an optimal coordinated attack to corrupt a set of meters.

In fact the magnitude of elements in a generic sparse attack vector, obtained by solving the problem (3.3), can be large. It can push the estimator into the nonlinear regime which may lead to bad data alarms or non convergence of Gauss Newton method [16]. Anyway the adversary can always scale the attack to prevent the large magnitude of the attack vector.

(37)

3.3. EXAMPLE OF ATTACK

3.3 Example of attack

The example showed before, in Section 2.4, with the same vector of measurements and the same values is considered next.

But now, instead of a random anomaly, suppose that the network is attacked by an adversary, as shown in Figure 3.1. He constructs the attack vector a^∗ = a_kHc^∗ with c^∗ obtained by the minimization problem (3.3).

Figure 3.1: Attack on the power flow between buses 1 and 2 .

His goal is to increase the measurement of the power flow between buses 1 and 2. The value of P_1,2 corresponds to the second element of the vector of measurements z. The attacker wants to add a quantity of a2 = 2pu, for example.

The obtained attack vector is:





 a₁ a₂ a₃ a₄ a₅







=





 0 2

−2 0

−2







(3.4)

(38)

Then the z_attack = z + a is:







z_1attack z_2attack z3attack

z_4attack z_5attack







=





 P₁ P₁₂ P21

P₂₄ P₁₃





 +





 e₁ e₂ e3

e₄ e₅





 +





 a₁ a₂ a3

a₄ a₅







=







1, 3253 0, 3641

−0, 3608

−1, 3273 0, 9314





 +





 0 2

−2 0

−2







=







1, 3253 2, 3641

−2, 3608

−1, 3273

−1.0686







The estimated measurements provided by the state estimation model are :





 ˆ z1attack

ˆ z_2attack ˆ z_3attack ˆ z4attack

ˆ z_5attack







=







1, 3127 2, 3687

−2, 3687

−1, 3273

−1, 0560







The residual between the true and the estimated measurement, in case of attack:

r_attack =







z_1attack z_2attack z_3attack z_4attack z_5attack







−





 ˆ z_1attack ˆ z_2attack ˆ z_3attack ˆ z_4attack ˆ z_5attack







=







1, 3253 2, 3641

−2, 3608

−1, 3273

−1.0686







−







1, 3127 2, 3687

−2, 3687

−1, 3273

−1, 0560







=







0, 0126

−0, 0046 0, 0079 0

−0, 0126







(39)

3.3. EXAMPLE OF ATTACK Comparing this residual with the one resulting by the state estimation with only the measurement (2.25), it is possible to see that their values are equal:





 r₁ r₂ r3

r₄ r₅







−







r_1attack r_2attack r3attack

r_4attack r_5attack







=







0, 0126

−0, 0046 0, 0079 0

−0, 0126







−







0, 0126

−0, 0046 0, 0079 0

−0, 0126







=





 0 0 0 0 0







This means that a stealthy attack does not affect the residual, being practically invisible by the state estimator and by the bad data detector.

The corrupted meter will be processed in the control center. During this process decisions are taken and control laws are computed according to this false information.

(40)

3.4 Risk framework analysis

As shown in Section 3.3 an attack can bypass the bad data detection in the control center. This is possible if the attacker knows the model and if he has access to the required measurements.

In particular, the knowledge of the DC-model of the network is required, represented by the matrix H, to construct a stealthy attack.

It means that an attacker is able to gain access to important information about the topology of the network. For this reason it is important to per- form accurate risk assessments.

Risk is defined as the impact times the likelihood of an event [19]. Thus it refers to the ability of the system to limit attacker’s access to network or, at least, to the control center. Risk analysis involves in testing the security of the network.

• The first step is the infrastructure vulnerability analysis: authentication, access control and message integrity are some of the tested security methods. The reliance on communication networks and standard communication protocols to transmit measurements and control packet increases the weakness of the network. Indeed testing infrastructure does not exploit the compatibility of the measurements with the physical process, thus this first analysis is ineffective against attacks targeting the physical dynamics [14].

• The following step is the power application impact analysis, which determines possible impacts to the application in the control center related to the power system. An example of application is state estimation, which provides good estimate of measurements even when the field devices provide imperfect data or there are communication channel malfunction. This, together with the bad data detection, can represent a security tool in the control center [19].

• Finally the physical impact analysis is performed to quantify the damage on the power system. With system simulation method is it possible to test the performance of the network in term of stability, power flows and variations. Then, it is also possible to estimate potential economic losses [19].

Improving Cyber-Security of Power System State Estimators