Identifying Power Quality Issues in LV Distribution Grid by Using Data from Smart Meters: Exploring possibilities of machine learning algorithms

(1)

STOCKHOLM SWEDEN 2020 ,

Identifying Power Quality Issues in LV Distribution Grid by Using Data from Smart Meters

Exploring possibilities of machine learning algorithms

SAMANTHA CHEN

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

Distribution Grid by Using Data from Smart Meters

Exploring possibilities of machine learning algorithms

Samantha Chen

A thesis presented for the degree of Master of Science

School of Electrical Engineering and Computer Science KTH Royal Institute of Technology

Stockholm, Sweden

September 3, 2020

(4)

I would first like to express my deepest gratitude for my thesis supervisor Edel Wallin of the R&D- department at Vattenfall AB. Despite having a busy schedule, he always made sure to be available whenever there were any questions or challenges regarding the thesis project. With his constructive feedback and expertise in the power grid, I was steered in the right direction whenever I needed it.

His guidance and input were and are still much appreciated.

Furthermore, I would like to extend my sincerest thanks to my thesis supervisor Oscar Utter- bäck of the School of Electrical Engineering and Computer Science at KTH Royal Institute of Technology. Without his assistance and expert knowledge in machine learning and Python, the study could not have been successfully conducted.

I also wish to pay my special regards to my examiner and professor Lars Nordström of the School of Electrical Engineering and Computer Science at KTH Royal Institute of Technology.

With his guidance and academic perspective, I was able to form a flexible framework for my project.

Last but not least, I wish to thank all the people from Vattenfall R&D and Eldistribution, who

have provided invaluable assistance during my study. It has been a milestone in the completion of

this project and a pleasure to work with everyone.

(5)

Since there is a significant potential to supervise the low voltage network with the assistance of the end-customer smart meters, Vattenfall Eldistribution AB wants to take advantage of such data.

Therefore, this project’s overall goal is to investigate how some specific grid disturbances could be detected in certain meter data types. There is a plethora of event data from several different grid areas with their own unique set of customers and power flow. Furthermore, the project aims to propose a detection method within the smart meter’s capability and explore the possibility of using smart meter data to identify the grid’s state. The literature study suggested that the machine learning approaches k-means and SVM were suggested to be used within this study’s scope. Sev- eral supervised and unsupervised machine learning algorithms have been identified and applied to power quality issues in various ways. Furthermore, each approach had four cases applied as well to broaden the analysis.

After conducting the study, the project results show that smart meter data indeed has the poten- tial to be used in machine learning methods to identify weak grids. However, the study shows that the information gained from smart meter data in its current state alone is not enough to distinguish weak grids from strong grids. For instance, the current data could complement grid data, such as loop impedance and topology data.

Future work could include using the same machine learning methods on higher dimensions

input data to separate the data points. One way to diversify the data could be to include data

describing grid topology and data from PQ-meters. Furthermore, it will be possible to continuously

monitor the low voltage grid conditions with future smart meters. In turn, this may give a better

insight into how the voltage levels behave for weak and strong grids, respectively.

(6)

Vattenfall Eldistribution AB har en stor andel smarta mätare som registrerar elkvalitetsproblem på lågspänningsnätet. Detta medför att det finns stor potential att övervaka lågspänningsnätet med hjälp av data från dessa mätare. Därför är det övergripande målet med detta projekt att undersöka hur vissa specifika nätstörningar uppstår i vissa typer av mätdata. Det finns dessutom en uppsjö av data från många olika nätområden där alla har sin egna unika uppsättning av kunder och energi- flöden. Vidare syftar projektet till att utveckla en detekteringsmetod inom smartmätarens kapacitet och utforska möjligheten att använda smarta mätdata för att identifiera nätets tillstånd. Från litter- aturstudien drogs slutsatsen att k-means och SVM var de mest lämpade metoderna att användas för denna studie. Flera maskininlärningsmetoder har identifierats och tillämpats på elkvalitetsproblem.

Vidare analyserades data för fyra olika fall per metod.

Efter att ha genomfört studien visade resultatet att data från smarta mätare sannerligen har potential att analyseras med maskininlärningsmetoder för att identifiera svaga nät. Däremot indik- erar studien att den information som kan utvinnas från smarta mätare i det nuvarande läget inte är tillräckligt för att urskilja svaga nät från starka nät. Exempelvis kan data från smarta mätare kompletteras med nätdata, såsom impedans, och information om nätets topologi.

Framtida projekt skulle därför kunna applicera samma maskininlärningsmetoder på indata med högre dimensioner för att möjliggöra en separation av data på flera plan. Ett sätt att diversifiera data vore att inkludera exempelvis information som beskriver nättopologi och data från PQ-mätare.

Vidare kommer framtida mätare ha möjlighet att övervaka nätets spänning och ström kontinuerligt.

(7)

1 Introduction 1

1.1 Background . . . . 1

1.2 Aim and goals . . . . 1

1.3 Delimitations and scope . . . . 2

1.4 Expected outcomes and results . . . . 2

2 Literature review 3 2.1 Definition of weak grids . . . . 3

2.2 Power quality in the grid . . . . 4

2.2.1 Definitions and existing regulation . . . . 4

2.2.2 Typical power quality issues . . . . 5

2.2.2.1 Transients . . . . 5

2.2.2.2 Short-duration RMS variations . . . . 7

2.2.2.3 Long-duration RMS variations . . . . 9

2.2.2.4 Voltage imbalance (unbalance or asymmetry) . . . . 10

2.2.2.5 Voltage fluctuations . . . . 10

2.3 Machine learning to identify power quality issues . . . . 11

2.3.1 Background of machine learning . . . . 11

2.3.2 Some known algorithm applications on power quality . . . . 13

2.3.2.1 SVM and k-fold for source location of voltage sags in an HV grid 14 2.3.2.2 SVM to classify weak grids on LV grids by hosting capacity . . . 15

2.3.2.3 K-means, FCM, and DT to locate voltage sag source . . . . 16

2.3.2.4 DT and FCM for recognition of power quality issues . . . . 18

2.3.2.5 Applicability of related methods to the study at hand . . . . 19

3 Methodology 20 3.1 Choice of algorithms for the project . . . . 20

3.2 Data extraction . . . . 21

3.3 Algorithm development . . . . 22

3.3.1 Initial statistical data analysis . . . . 22

3.3.2 Pre-processing data with sliding window algorithm . . . . 24

3.3.3 Unsupervised machine learning . . . . 25

3.3.3.1 K-means for clustering . . . . 25

3.3.4 Supervised machine learning . . . . 25

3.3.4.1 Support vector machine . . . . 25

(8)

4 Results 26 4.1 Unsupervised machine learning . . . . 26 4.1.1 First case using k-means to cluster raw data from the smart meters of Grid 1 26 4.1.2 Second case of identifying normal operation from the smart meters of Grid

1 using k-means . . . . 28 4.1.3 Third k-means case in regards to the number of incidents during normal

operation . . . . 34 4.1.4 Fourth case using loop impedance and event data from sliding window . . 46 4.2 Supervised machine learning . . . . 50 4.2.1 First case of support vector machine on data from Grid 1 . . . . 50 4.2.2 Second case of support vector machine on data per customer from Grid 1

and Grid 2 . . . . 52 4.2.3 Third case of support vector machine on data and loop impedance from

Grid 1 to Grid 9 . . . . 55 4.2.4 Fourth case of support vector machine with increased bins from event data

of Grid 1 to Grid 9 . . . . 59

5 Discussion 62

5.1 Analysis and interpretation of the results . . . . 62 5.2 Influence of the limitations on the results . . . . 65 5.3 Contributions and new insights . . . . 66

6 Conclusion 66

6.1 Conclusive summary . . . . 66 6.2 Future work . . . . 67

References 67

A.1 Initial statistical data analysis of Grid 1 I

A.2 Result of the first case of k-means IX

A.3 Results of the second case of k-means XIII

A.3.1 Finding the duration limits for normal condition . . . XIII A.3.2 The k-means result of the second iteration . . . XVII

A.4 Summary of every result using SVM XXV

(9)

List of Figures

1 A lightning stroke current that results in impulsive transients [1]. . . . 6

2 An example of back-to-back switching of a capacitor which in turn causes an os- cillatory transient [1]. . . . 6

3 Temporary voltage sag caused by motor starting operation [1]. . . . 7

4 Instantaneous voltage surge caused by SLG fault [1]. . . . 8

5 Momentary interruption due to fault and subsequent recloser operation [1]. . . . 8

6 Unbalanced three-phase voltage represented by phasors [2]. . . . 10

7 Example of voltage fluctuations caused by arc furnace operation. The y-axis shows the percentual change from the voltage’s nominal value [1]. . . . 11

8 Some different algorithms for supervised and unsupervised machine learning [3]. . 12

9 Example of 2D data (blue dots and orange crosses) that are separated by possible hyperplanes using SVM [4]. . . . 15

10 Example of k-means data clustering, where k = 2 [5]. . . . 16

11 A simple decision tree with binary leaf nodes [6]. . . . 17

12 The flow chart shows the ruled decision tree algorithm, where Gx represents a group, Fx represents a feature, and Cx represents a PQ-issue [7]. . . . 19

13 The total statistical distribution of unfiltered smart meter data. . . . . 23

14 The k-means analysis of the data sets containing the total event peak voltage level and duration over the year 2012 to 2019. . . . 27

15 The k-means analysis of the data sets containing event peak voltage level and du- ration during 2012. . . . 30

16 The k-means analysis of the data sets containing event peak voltage level and du- ration during 2016. . . . 31

17 The total statistical distribution of time duration for voltage sags and surges limited by time, and at least 1 s. . . . 32

18 The statistical distribution of smart meter data over the years 2012-2019, filtered by an event duration between 0-100 s and between 170-270 V. . . . 33

19 The k-means analysis of the data sets containing event peak voltage level and du- ration over the years 2012-2019. . . . 34

20 The k-means analysis of the data sets containing voltage sag duration and the num- ber of incidents of such durations during year 2012-2017. . . . 36

21 The k-means analysis of the data sets containing voltage surge duration and the number of incidents of such durations during year 2012-2017. . . . . 37

22 The k-means analysis of the data sets containing event duration and the number of incidents of such durations during year 2018-2019. . . . 38

23 The k-means analysis of peak voltage and duration of events during normal opera-

tion in relation to year. . . . 39

(10)

24 The k-means analysis of peak voltage and duration of events during normal oper- ation in relation to year, where the number of incidents for each peak voltage or duration occurs at least 5 times. . . . 40 25 The k-means analysis of peak voltage and duration of events during normal oper-

ation in relation to year, where the number of incidents for each peak voltage or duration occurs at least 10 times. . . . 41 26 The k-means analysis of the peak voltage level of voltage sags in relation to year,

where the number of incidents for each peak voltage level occurs at least X times. . 42 27 The k-means analysis of the peak voltage level of voltage surges in relation to year,

where the number of incidents for each peak voltage level occurs at least X times. . 43 28 The k-means analysis of the duration of voltage sags in relation to year, where the

number of incidents for each event duration occurs at least X times. . . . . 44 29 The k-means analysis of the duration of voltage surges in relation to year, where

the number of incidents for each event duration occurs at least X times. . . . . 45 30 Test 1: The k-means analysis of the duration of the event data of voltage level and

event duration in relation to loop impedance. . . . 48 31 Test 2: The k-means analysis of the duration of the event data of voltage level and

event duration in relation to loop impedance. . . . 49 32 3-Class classification on the event data from Grid 1 during years 2012-2019 using

SVM. The classification is color plotted as following: "Strong (1)" = blue, "Be- coming weak (2)" = yellow, "Weak (3)" = brown. . . . 51 33 3-Class classification on the event data per customer from Grid 1 during years 2012-

2019 using SVM. The classification is color plotted as following: "Strong (1)" = blue, "Becoming weak (2)" = yellow, "Weak (3)" = brown. . . . 53 34 3-Class classification on the event data per customer from Grid 2 during years 2010-

2019 using SVM. The classification is color plotted as following: "Strong (1)" = blue, "Becoming weak (2)" = yellow, "Weak (3)" = brown. . . . 54 35 3-Class classification on the event data per customer from Grid 1 and Grid 2 using

SVM. The classification is color plotted as following: "Strong (1)" = blue, "Be- coming weak (2)" = yellow, "Weak (3)" = brown. . . . 55 36 3-Class classification on the event data per customer from Grid 1-9 during 2019,

using SVM. The classification is color plotted as following: "Strong (1)" = blue,

"Becoming weak (2)" = yellow, "Weak (3)" = brown. . . . 57 37 The k-means analysis of the duration of voltage surges in relation to year, where

the number of incidents for each event duration occurs at least X times. . . . . 58 38 3-Class classification of y _a on the event data per customer from Grid 1-9 during

2019, using SVM. The classification is color plotted as following: "Strong (1)" =

blue, "Becoming weak (2)" = yellow, "Weak (3)" = brown. . . . 60

(11)

39 3-Class classification of y _b on the event data per customer from Grid 1-9 during 2019, using SVM. The classification is color plotted as following: "Strong (1)" = blue, "Becoming weak (2)" = yellow, "Weak (3)" = brown. . . . 61 A.1 The statistical distribution of unfiltered smart meter data during 2012. . . . I A.2 The statistical distribution of unfiltered smart meter data during 2013. . . . II A.3 The statistical distribution of unfiltered smart meter data during 2014. . . . III A.4 The statistical distribution of unfiltered smart meter data during 2015. . . . IV A.5 The statistical distribution of unfiltered smart meter data during 2016. . . . V A.6 The statistical distribution of unfiltered smart meter data during 2017. . . . VI A.7 The statistical distribution of unfiltered smart meter data during 2018. . . VII A.8 The statistical distribution of unfiltered smart meter data during 2019. . . VIII A.9 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2012. . . . IX A.10 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2013. . . . IX A.11 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2014. . . . X A.12 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2015. . . . X A.13 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2016. . . . XI A.14 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2017. . . . XI A.15 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2018. . . XII A.16 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2019. . . XII A.17 The total statistical distribution of voltage sag duration by time. . . XIV A.18 The total statistical distribution of voltage surge duration limited by time. . . . XV A.19 The total statistical distribution of time duration for voltage sags and surges limited

by time, and at least 1 s. . . XVI A.20 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2012. . . XVII A.21 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2013. . . XVIII A.22 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2014. . . XIX A.23 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2015. . . XX

(12)

A.24 The k-means analysis of the data sets containing event peak voltage level and du- ration during 2016. . . XXI A.25 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2017. . . XXII A.26 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2018. . . XXIII A.27 The k-means analysis of the data sets containing event peak voltage level and du-

ration during 2019. . . XXIV

(13)

List of Tables

1 The challenges in machine learning according to [8]. . . . 13

2 General comparison of the customers on the most critical branches of Grid 1-9 during year 2019. The most critical customer on the branch is bolded as well. . . . 46

2 General comparison of the customers on the most critical branches of Grid 1-9 during year 2019. The most critical customer on the branch is bolded as well. . . . 47

3 The accuracy of SVM for the first case. . . . . 51

4 The accuracy of SVM on Grid 1 for the second case. . . . 53

5 The accuracy of SVM on Grid 2 for the second case. . . . 54

6 The accuracy of SVM on Grid 1 and Grid 2 for the second case. . . . 55

7 The accuracy of SVM on Grid 1 for the third case using output y _a . . . . 56

8 The accuracy of SVM on Grid 1 & 2 for the third case using output y _b . . . . 58

9 The accuracy of SVM on Grid 1 to Grid 9 for the fourth case using output y _a . . . . 59

10 The accuracy of SVM on Grid 1 to Grid 9 for the fourth case using output y _b . . . . 59

11 The accuracy of SVM for Grid 1 to Grid 9 with v _bins = 3, d _bins = 3, and output y _b . . 62

A.1 The accuracy of SVM for all four cases. . . XXV

A.1 The accuracy of SVM for all four cases. . . XXVI

(14)

List of Abbreviations

DER Distributed Energy Resource DSO Distribution System Operator AMI Advanced Metering Infrastructure LV Low Voltage

MV Medium Voltage PQ Power Quality RMS Root Mean Square EPS Electrical Power System

SAIFI System Average Interruption Frequency Index SAIDI System Average Interruption Duration Index DC Direct Current

AC Alternating Current ML Machine Learning

VMD Variational Mode Decomposition DT Decision Tree

ANN Artificial Neural Network SVM Support Vector Machine FCM Fuzzy C-Means

LVM Low Voltage Monitoring SAG Voltage Sag

SUR Voltage Surge

WCSS Within-Cluster Sum of Squares

NIS Network Information System

(15)

1 Introduction

1.1 Background

Due to the increasing integration of distributed energy resources (DER), Distribution System Op- erators (DSOs) are dealing with increasing disturbances such as harmonics, voltage sag and surge, current imbalances, and a low power factor in the power systems. Such disturbances could damage the customers’ electrical equipment, increase the grid operational cost, and hasten the aging of net- work assets. To address these challenges, better monitoring of the LV grid is necessary. This will be a core requirement to improve the power quality and identifying future flexibility needs.

One of the fundamental components of the power grid to address such challenges is smart me- ters. Vattenfall AB’s advanced metering infrastructure (AMI) system, including advanced smart meters, enables a new monitoring level of the LV network and improved MV network supervision.

The meters are already providing streams of data, information, and alarms that are recorded. How- ever, end-customer smart meters are not being used in their full potential as many of the possible measurements are not being recorded and processed for further analysis. Therefore, DSOs must es- tablish effective power quality management policies based on increased smartness to run an overall efficient distribution power grid.

Since there is a significant potential to supervise the LV network with the assistance of the end-customer smart meters, Vattenfall Eldistribution AB, a Swedish distribution company, wants to take advantage of such data. The value is to bring in event information from the smart meters to contribute to better and more efficient monitoring of the LV and MV network, thereby improving power quality, fault detection, and outage management functionalities. Therefore, this project will explore how smart meters’ data can be used to understand their potential for grid optimization, maintenance, and planning and indicate network flexibility needs.

1.2 Aim and goals

The overall goal of this project is to investigate how some specific grid disturbances would manifest in certain types of meter data and propose a detection method within the smart meter’s capability.

This specifically refers to the ability to identify weak grids in the system during normal operation.

Power quality disturbances (voltage sags and voltage surges) will be analyzed to achieve this goal.

This will be done with the help of recorded smart meter data. In the project, the aim is to develop an algorithm that allows Vattenfall Eldistribution to monitor critical network power quality parameters and gain insight into the state of its LV distribution grid. The following research question will be served as a framework for the project:

• To what extent can event data from smart meters determine when an LV grid is weak by

applying machine learning methods?

(16)

1.3 Delimitations and scope

The project’s core areas are focused on the existing AMI, grid conditions, DER integration, and big data analytics. Furthermore, the study focused on some particular PQ-issues. Here, several disturbance locations in the distribution grid would be analyzed. Outside the thesis’s scope are harmonics, supra-harmonics, frequency, and flicker, and transients are outside this study’s scope.

For instance, since the end-customer smart meters of Vattenfall Eldistribution AB cannot register any events with a duration of less than one period, or 120 ms, transients are therefore considered outside the scope of analysis for this project.

Furthermore, the grid data from the smart meters are provided by Vattenfall Eldistribution. Data and analyses would be set based on smart meter capability and current configuration. Furthermore, the study focuses mainly on machine learning methods on a subset of the smart meter events. Since there are many machine learning methods, this project will focus on the support vector machine and k-means.

Suitable algorithms developed in Python would be needed at intermediate points so that, if power quality issues or other disturbances have arisen, clear information can be sent to the operator about the type of the disturbance and possible causes as where the event is located. Consequently, appropriate measures can be taken to eliminate the problem.

Lastly, the study would focus on the existing smart meter configuration. Therefore, the limita- tions of the present AMI system would be considered. An approach would be presented in which the LV distribution grid data is combined with the customer smart meter data to pinpoint customers who might introduce disturbances. However, due to the project’s time limitations, grid data will only be included briefly in the latter part of the project. Furthermore, grid topology is not taken into account in this project.

1.4 Expected outcomes and results

The study’s expected outcome is to explore the possibilities of using machine learning algorithms to gain a detailed understanding of the LV grid status. A PoC, based on supervised or unsupervised machine learning methods, would be delivered to do the detection problem using large data set from smart meters. Such advanced analytics could allow Vattenfall Eldistribution AB to better understand the grid condition, predict future trends, and support its decisions. This will be achieved by:

• Conducting a literature study on weak and strong grids, specific events, and power quality issues that appear in these systems and potential machine learning algorithms worldwide to identify weak grid.

• Proposing a PoC, using machine learning model, that detect specific power quality event in

the grid to identify weak grids in the system using historical data sets from the smart meters,

to develop further a full model trained with the whole grid.

(17)

• Formulate potential applications that add value to the smart meters’ data and how the pro- posed method may be implemented into the current AMI to improve its reliability.

• Identifying existing strengths in the present smart meter system and, thus, possibilities for the future smart metering system.

2 Literature review

2.1 Definition of weak grids

While there are no clear definitions of weak grids, these typically refer to grid structures sensitive to changes in active and reactive power production and consumption [9] [10]. In turn, this contributes to more frequent unsafe situations in weak grids, where the supply interruption time is longer as well [11]. In other words, certain power quality events are connected explicitly to weak grids [12].

Specifically, the following main questions are pointed out: voltage regulation, system stability, and integration of renewable power generation in weak power grids [11].

In this study, the smart meter data from weak LV distribution grids will be analyzed. Therefore, the term "weak grids" will exclusively refer to the LV distribution grids to continue the report.

Therefore, it may be interesting to study the short circuit power ratio at the fault point lower in weak grids. Furthermore, if the grid is isolated and small, voltage dips may propagate through the whole system. Additionally, generators connected to weak grids can give rise to persistent oscillations proportional to how weak a system is. Nevertheless, these conditions are rare on LV grids and more frequent on transmission grids. More common issues on LV grids are problems with voltage levels, as well as overloading [11].

When it comes to LV distribution grids in Sweden, they are rarely isolated. Furthermore, for LV grids, the active power flows also impact the voltage regulation since the system is more resistive than transmission grids. The higher resistivity also corresponds to a higher loop impedance [11].

Integration of more intermittent power generation, such as wind power and PV, may contribute to more instability due to their low inertia. Consequently, when the system inertia is low, it causes the system to become sooner unstable. Another example includes an increased integration of EV in the LV grid. For instance, if many EVs are charging simultaneously in a weak grid, it may cause a momentary interruption in the system [12].

Renewable power generation in weak LV distribution grids may not only cause flicker due to its variability. However, it can also give rise to voltage fluctuations that cross acceptable voltage limits, and more frequently than in a strong grid. Furthermore, voltage imbalance is also more common in weak grids if the grid is weakly interconnected. Additionally, renewable power generation has negligible inertia because these methods being non-synchronous. Consequently, these may cause additional complications with frequency regulation [11]. Other aspects considered when classifying the grid were the following:

• Loop impedance in the grid area.

(18)

• Voltage variation and PQ-issues affected the grid area across the years, such as undervoltages or overvoltages.

• Information about historical grid reinforcement in the grid area.

• Info about other measures that were taken in the specific grid area, such as grid topology.

2.2 Power quality in the grid

2.2.1 Definitions and existing regulation

Power quality issues typically refer to "any power problem manifested in voltage, current, or fre- quency deviations that result in failure or misoperation of customer equipment" [13]. It is important to consider power quality issues not only because of the everyday inconvenience it may cause to private customers but also because of the economic impact. Since many types of equipment are directly dependent on a reliable electricity supply, even a small disturbance may cause a company to lose a considerable amount of money [13] [14].

The primary responsibility to maintain acceptable power quality in the grid falls on power com- panies, such as Vattenfall Eldistribution AB [15]. In actuality, power quality refers to the quality of the voltage in most cases. This is because the power supply system can only control the volt- age quality, whereas the current drawn by particular loads cannot be controlled to the same extent.

Therefore, the power quality standards are shaped around maintaining supply voltage within spec- ified limits [13]. In Sweden, Energimarknadsinspektionen (Swedish Energy Markets Inspectorate) regulates the energy market and the electricity networks’ monopoly operations. This includes the responsibility to communicate the requirements that must be fulfilled for the power to be of good quality. The regulations may be found in the standard EIFS 2013:1 [16]. Consequently, EIFS 2013:1 is based on the European standard EN 50160:2010 [17]. For instance, the standard men- tions that during every period in a week, 95% of the amount of 10-minute averages of the RMS voltage value must be within ±10% of its nominal RMS value. The lower nominal phase-to-phase voltage is 400 V for LV grids, whereas the maximum RMS voltage is 1 kV. On the other hand, the phase-to-neutral voltage is 230 V. For MV grids the RMS voltage is between 1 kV to 36 kV [9] [17]. Furthermore, when European standards are decided, it is common for them to gather recommendations from the international standard IEEE 1159-2019 [1].

However, since each region might have its preferences for power quality, there are slight dif-

ferences between the three mentioned standards. As mentioned in the previous section, there is

no universal standard that defines a weak grid. Therefore, each DSO must define its limitations

for what a weak grid is concerning power quality issues. In turn, Vattenfall Eldistribution AB has

specified its definitions based on recommendations from the standards EN 50160:2010 and EIFS

2013:1, as well as customer case experience [9]. Therefore, this project will refer to Vattenfall

Eldistribution AB’s guidelines and the European standard when defining power quality issues. If

(19)

not enough information is found in the before-mentioned documents, additional information will be gathered from IEEE 1159-2019.

2.2.2 Typical power quality issues

There are many types of power quality issues occurring in the power system. According to [18], the most common ones are transients, voltage sags and surges, interruptions, and flickers. Since differ- ent types of issues require different solutions, they have been grouped according to the European standard EN 50160:2010, as well as the international standard IEEE 1159-2019 [1] [17].

Various methods have been developed to protect the power system from power quality issues.

For instance, a DSTATCOM can be integrated at the point of common coupling to reduce current and voltage related power quality issues. It regulates the fault voltage at a reference value during voltage control mode, which protects loads from voltage sags, surges, and imbalances [19]. Cus- tomers can also install mitigation devices if their source originated, power quality issues are preva- lent. These include interruptions, voltage sags, current harmonics, and unbalanced currents. For instance, using custom power devices makes it possible to compensate with reactive power or have harmonics cancellation devices connected in a load bus’s shunt. Furthermore, there is improved power quality converter technology to reduce the harmonic currents during AC-DC conversion. It is also possible to use thyristor switched capacitors to eliminate dominant harmonics produced by electronic switching [19].

Furthermore, certain power quality disturbances also differ in how far they travel. While most PQ-issues are mitigated by power system protection equipment, short term RMS variations, like voltage sags, can travel from transmission level to the LV grid, i.e., where the sensitive loads are lo- cated [20]. Furthermore, harmonics can also travel through the power system to the sensitive loads on the LV grid. Even if the customer equipment does not cause any harmonics, these waveform dis- tortions can still reach them from a different source location [21]. Transients usually do not travel comparatively far from the source. However, in some cases, they may appear at open circuits in low voltage systems if the power system protection fails to mitigate [1]. This is why it is of particular interest to study these power quality issues’ patterns during the project. However, since transients and harmonics are considered outside the analysis scope, the project will start by analyzing voltage sags and surge. The definitions for power quality issues on low voltage and medium voltage grids will be presented in the following subsections.

2.2.2.1 Transients

Transients are fast voltage spikes that last for less than 10 ms, along with the minimum and max- imum values that differ from the typical sine wave [1] [9]. The transient may be both positive and negative. Generally speaking, there are two types of transients: impulsive and oscillatory.

Lightning strikes mainly cause impulsive transients. They are referred to as "sudden, non-power

frequency change in the steady-state condition of voltage, current or both that is unidirectional in

polarity (primarily either positive or negative)" [1] [9].

(20)

Lightning strikes usually cause only impulsive transients. When identifying impulsive tran- sients, it is interesting to study the rise and decay time and the sharp magnitude [1]. An example of this can be observed in figure 1. In some cases, impulsive transients may also give rise to oscillatory transients because of the high frequencies involved, whose waveform is illustrated in figure 2. This waveform is because capacitors in the system may get energized by the lightning strike [1] [9].

Figure 1: A lightning stroke current that results in impulsive transients [1].

Figure 2: An example of back-to-back switching of a capacitor which in turn causes an oscillatory transient [1].

In theory, oscillatory transients have the following definition: "a sudden, non-power frequency

change in the steady-state condition of voltage, current, or both, that includes both positive and

negative polarity values." The main difference is the rapid change of polarity for either the voltage

(21)

or current’s instantaneous value. If the power system source is inductive, the capacitor voltage will overshoot and rings at the system’s natural frequency [1] [22]. In some instances, the oscillatory transient may reach twice the normal system peak voltage [23]. The transient voltages may damage equipment in various ways. For instance, equipment might degrade over time or even breakdown dielectrically in an instant. Furthermore, they may cause nuisance tripping of adaptable speed drives and insulation failure in the EPS apparatus [24]. However, as mentioned in the introduction, transients are considered outside the analysis scope for this project.

2.2.2.2 Short-duration RMS variations

For a variation to be considered "short-duration," the voltage RMS value must vary from the nomi- nal value for durations between 10 ms and 1 minute at the power frequency [9]. This includes either the voltage or the current. However, if it is a voltage variation, then the variation’s duration may be identified and categorized into three types of variations: instantaneous, momentary, or temporary.

Furthermore, the magnitude of the short-duration variation can also be decided either as sag, surge, or interruption [1].

Voltage sags are temporary RMS voltage drops between 10% to 90% to its nominal value for durations of 10 ms to 1 minute at the power frequency [9]. They are usually caused by EPS faults [25]. Additionally, other causes are switching large loads to isolate faulted sections or when starting large induction motors [24] [26]. It may also be observed in figure 3.

Figure 3: Temporary voltage sag caused by motor starting operation [1].

(22)

Figure 4: Instantaneous voltage surge caused by SLG fault [1].

Figure 5: Momentary interruption due to fault and subsequent recloser operation [1].

The voltage sags mostly cause a slight decrease in output from a capacitor bank or a relatively

small speed variation of induction machines [24]. However, the sensitivity to voltage sags depends

on what type of end-user facility equipment is concerned [9]. For instance, it is crucial to consider

voltage sag ride-through capability for sensitive processes, that is, if the equipment is required to

operate correctly or not at the voltage drop level [1] [26]. In [27], it is mentioned that voltage sags

may cause a loss of dc-link voltage and result in adjustable speed drives tripping.

(23)

A voltage surge is a temporary increase in the voltage RMS value of more than 110% of the nominal voltage, and for durations between 10 ms and 1 minute at the power frequency [9]. It is shown in figure 4. Like voltage sags, the voltage surges are also caused by EPS faults, switching off large loads, or the connection of large generation. Nevertheless, voltage surges occur less frequently than voltage sags [24]. During a fault situation, the severity of voltage surges during a fault situation depends on the grounding, system impedance, and fault location.

2.2.2.3 Long-duration RMS variations

When the RMS deviation is longer than 1 minute, it is considered a long-duration variation. These typically refer to either sustained interruption, undervoltages, overvoltages. Sustained interrup- tions occur when the RMS voltage decreases below 10% for more than 1 minute [1]. Permanent faults usually cause these due to storms or equipment failure in the power system, which is why restoration requires manual intervention [1] [25]. However, sustained interruptions should not be confused with outages, as outages rather refer to "the state of a component in a system that has failed to function as expected" [1]. Two main indices measure sustained interruptions:

• SAIFI (the System Average Interruption Frequency Index) refers to the average frequency of the interruption [28].

• SAIDI (the System Average Interruption Duration Index) refers to the average time of the interruption [28].

The RMS voltage is classified as an undervoltage when it decreases between 10% to 90% of its nominal value. In turn, the voltage drop must be in combination with a period longer than 1 minute. These are usually caused when a capacitor bank switches off or when a large load switches on. In other words, undervoltages may also occur on overloaded circuits. In Sweden, they may also be caused by single-phase EV charging stations in mainly rural areas [29]. However, although these disturbances are more common in rural areas, they can also arise in a weak grid in urban and semi-urban areas.

Undervoltages may cause adjustable speed drives to trip when the dc-link voltage level drops too low [27]. However, voltage regulation equipment in the system can adjust the voltage back to its nominal value. If the undervoltage lasts under a sustained period, it is referred to as a "brownout."

However, there is no clear definition for this classification, which is why the term should be avoided in scientific context [1] [25].

Overvoltages are when the RMS voltage increases over 110% of its nominal voltage for longer

than 1 minute. These typically appear in the grid because of inadequate voltage control in the

system or a too weak system for the desired voltage regulation. In contrary to undervoltages,

overvoltages are instead caused by a capacitor bank switching on. Overvoltages can also occur

when a large load switches off [1] [25]. They may also be caused by single-phase PV installations

in mainly rural areas, which has occurred in Sweden [29] [30]. In turn, this may damage EPS

equipment if these are not designed to operate for the overvoltage [31].

(24)

2.2.2.4 Voltage imbalance (unbalance or asymmetry)

When the three-phase voltages are displaced from their normal 120-degree phase relationship or when the amplitudes differ, it is called voltage imbalance [24] [25]. This is illustrated in figure 6.

The imbalance can be expressed in terms of percentage and should be equal to or less than 2% at the customer’s connection point acceptable in the LV and MV grids in Sweden [9]. Furthermore, the voltage imbalance is relative to the negative sequence component’s magnitude to the positive sequence component [1]. This is also described in equation 1.

Figure 6: Unbalanced three-phase voltage represented by phasors [2].

While it is common for the voltage imbalance to be 5% or less, the current imbalance can reach higher percentages [1]. The main sources of voltage imbalance are unbalanced loads and capacitor bank abnormalities [9] [24]. This includes the one-phase or three-phase bank having a blown fuse on one phase of a three-phase bank [24]. Furthermore, single-phase PV installations on a three- phase circuit may also cause voltage imbalance [11]. Furthermore, when power inverters are fed with unsymmetrical voltage, it may give rise to harmonics [1]. In turn, it may cause overheating in synchronous and induction motors, which may cause higher losses [32].

Voltage imbalance [%] = V _neg

V _pos

× 100% (1)

2.2.2.5 Voltage fluctuations

When the voltage has a series of random voltage changes or systematic variations that stay within

95% to 105% of its nominal value, it is called voltage fluctuation, or variations [9]. This is also

shown in figure 7. Several components can cause fluctuations. For instance, voltage fluctuations

(25)

on the distribution and transmission system are commonly caused by arc furnaces. Other causes include continuous and rapid variation in the reactive component of the load. Specifically, in Swe- den, the voltage fluctuations are becoming increasingly common due to the integration of PV in the LV-grid [33]. The impact of voltage fluctuations’ impact on lightning is referred to as flicker.

Usually, this can be observed by humans in changes in lamp illumination intensity [1] [25].

Figure 7: Example of voltage fluctuations caused by arc furnace operation. The y-axis shows the percentual change from the voltage’s nominal value [1].

Voltage fluctuations do not cross the lower or upper limits for acceptable RMS voltage, as de- fined in section 2.2.1. Therefore, the smart meters will not register these fluctuations as events.

Furthermore, the current customer smart meters cannot monitor how the voltage varies within lim- its, so these will not be analyzed. It could, however, be of relevance if the smart meters are upgraded to perform this.

2.3 Machine learning to identify power quality issues

2.3.1 Background of machine learning

Machine learning (ML) is a widely used approach to automatically derive valuable information

from input data, such as finding patterns without using domain-specific expertise. Furthermore, an

algorithm is designed from an automatic process that turns data sets into meaningful information

[34]. This means that the ML algorithm will aim to find a relationship between some arbitrary

input variable x and some arbitrary output variable y. Two main prerequisites are required to do

this: that the input variable x and output variable y are indeed related, and that the information

from the available input variable x can be used to deduce the output variable y [35]. As mentioned

in section 1.3, this project will focus on supervised and unsupervised machine learning. Figure

8 gives an overview of what algorithms can be used for supervised and unsupervised machine

(26)

learning. However, despite the benefits of machine learning, such as potentially reducing the time needed for analysis and manual errors, each step of machine learning workflow comes with a few challenges [8]. These are presented in table 1.

Figure 8: Some different algorithms for supervised and unsupervised machine learning [3].

In other words, ML is about "learning the relationship" between x and y using training data.

More specifically, the training data consists of n number of samples (x _i , y _i ), which can also be

(27)

written as {(x ₁ , y ₁ ), ..., (x _n , y _n ), ..., (x _N , y _N )} [35] [36]. The input {x _i , ..., x _n , ...x _N } can also be written as X ∈ R ^N×D , where each particular individual x _n represents a row called data points or example. Furthermore, d = 1, ..., D represents each column of interest for the topic of a data set.

Consequently, the input data will be stored in x _n as a vector, whereas each example is stored in a D-dimensioned vector. This means it is possible to apply linear algebra in machine learning [36].

When the training data contains labeled output, supervised machine learning approaches can be utilized. This means the output value y _i is known for each corresponding input value of x _i . This particularly means to find a conditional distribution p (y _n |x _n ) [37].

Unsupervised machine learning is used when there are no labeled outputs to "supervise" the learning. In other words, the data {x _n } ^N _n=1 is unlabeled. Since the output may provide no insight, the algorithm must find a logic, structure, and input data. In turn, an unconditional distribution p (x _n ) will be found. Because the unsupervised learning method does not require a domain expert to label the data manually, it makes the method widely applicable compared to supervised learning.

The most relevant functionality, in this case, is clustering. This is done by dividing related examples together using the input data [37].

Table 1: The challenges in machine learning according to [8].

Workflow step Challenge

Access, explore and analyze data

Data diversity - Real-world datasets are not always tabular.

Preprocess data Lack of domain tools - Often require tools from multiple domains e.g. filtering and feature extraction.

Train models Time-consuming - Searching for the right model takes time and is partly trial and error, partly dependent on the size of the data.

Assess model performance Avoid pitfalls - Highly flexible models may be accurate but can also overfit the data, while simpler models may assume too much about the data.

Iterate Nonlinear workflow - Must always implement different ideas before converging onto a solution.

2.3.2 Some known algorithm applications on power quality

As mentioned in the previous section, it is increasingly common to use historical data to evaluate

the grid in a more automated way, such as using machine learning [38]. Examples of practical

applications can be event detection, fault classification, and fault location. This includes detecting

the time instant where the customer experienced the power quality issues, identifying the type of

(28)

power quality issue that occurred, and the distance from the customer or location on the distribution line [39]. This way, power companies, like Vattenfall Eldistribution AB, may create more value for the big data of their AMI and increase reliability in the grid [40]. Some related examples are presented in the following sections.

2.3.2.1 SVM and k-fold for source location of voltage sags in an HV grid

There have been various studies on power quality application. For instance, the source location for voltage sags using a support vector machine (SVM) is studied in [41]. This method has mainly been applied in a high voltage (138 and 230 kV) Brazilian regional network against all faults. In total, 40 features were extracted and produced in a feature vector for analysis. These include the final value of the energy, the final value of reactive power, phase angle during voltage sag, the first peak of phase currents, and line impedance.

Furthermore, a test power network of the Brazilian regional grid was simulated in MATLAB.

In the simulation, 6 points had installed monitors, with five monitors installed on the boundary between the 138 and 230 kV network. The last monitor is installed between the 13.8 kV and 138 kV network. Moreover, 14 fault points that mainly caused voltage sags were defined in the simulation. Here, 2 points are located on the 13.8 kV MV grid, 8 points are on the 138 kV network, and the final four are on the 230 kV system [41].

The simulated faults had a duration of 100 ms and 0 ^◦ incident angle and were symmetrical faults (LLL), earth faults (LG in phase-a), and asymmetrical faults (LLG and LL in bc). Two fault resistance values were also used for all fault cases: 0.001 Ω and 80 Ω. In total, the six monitors registered 112 events or fault cases, making it 672 cases. These cases were then used as input data for the support vector machine (SVM) algorithm [41]. In general, SVM is about finding hyperplanes among different classes of the training data. By dividing the data according to the hyperplanes, one can classify the data into certain categories [4]. This is shown in figure 9. The next step was to use the measured data to identify the data sets sensitive to the source location. These could then be classified as either upstream or downstream. This was followed by feature extraction using various methods, which was then classified with the SVM. Since SVM optimally separates data with the maximum margin, it can make use of the kernel trick. This method implicitly maps inputs into hyperplanes. Furthermore, the datasets are divided into different k-fold, where each fold is considered a validation set k of the previous fold k − 1, which consequently is the training set.

This is mainly done to prevent overfitting [41].

In [41] the dataset was divided into ten folds, where nine folds consist of the current training data, and the remaining one is the validation data. By combining the 40 extracted features and 15 randomly chosen test samples from the 672 cases, it was possible to receive an output that showed whether the voltage sag happened upstream or downstream in the power system [41]. Compared to different classification studies, the SVM method proposed in [41] gave the highest accuracy of over 90% for identifying the different faults.

However, the SVM disadvantage is that the proposed method works like a black box and cannot

(29)

be described mathematically. Furthermore, it requires extensive extractions of all features to be analyzed. Since the LV grids are quite diverse, the future requirement for grid development may sometimes be challenging to predict [41] [42]. Consequently, it is recommended to adopt an active learning technique to reduce the labeling process effort [42]. Alternatively, this method could be used if the feature extraction was previously completed.

Figure 9: Example of 2D data (blue dots and orange crosses) that are separated by possible hyper- planes using SVM [4].

2.3.2.2 SVM to classify weak grids on LV grids by hosting capacity

Another example of applied SVM classification is given in [42]. This study assesses grids based on their hosting capacity to develop an algorithm that may identify whether the grid is weak or not in a German LV distribution grid. In the study, the collaborating DSO identified ten grid features, which was the domain expert. The features included the number of transformer stations, the sum of rated transformer power, average straight-line distance to the transformer, and house connections.

In total, 300 LV grid areas were included in two sets, each with their unique setup of features to be analyzed in the project [42].

The next step included a simulation of the grid features, assuming that the position of distributed generation, PV systems in this case and the power substantially impacted the LV grid’s capacity.

Furthermore, the simulated grid was dimensioned after existing European standards. In turn, the simulation gave rise to 6 more features of each grid, which was used to train the SVM algorithm.

These new criteria were mainly related to simulation methods. This was proceeded by evaluating

what features were relevant for the classification. Four different feature selection methods were

used and averaged to give a final choice of features. As a result, it was preliminarily concluded

(30)

that relatively few features were needed to assess the grid structure. The result from the feature selection also corresponded to what domain experts recommended [42].

The labeling of the 300 LV grids was done by the domain experts, where the distributed gen- eration capacity was of foremost consideration. This resulted in five labels, or classes, of the grid structure: "very weak" (1), "weak" (2), "average" (3), "strong" (4), and "very strong" (5). This was additionally used as training data sets for the machine learning model. These classes also had dif- ferent defined limits of the average distributed generated power per transformer generation, which varied with class. In turn, a higher class corresponded to a higher limit values [42]. Consequently, it was possible to train the SVM algorithm, which yielded an accuracy of over 80% [42].

2.3.2.3 K-means, FCM, and DT to locate voltage sag source

In [43], the clustering algorithms k-means and fuzzy c-means (FCM) are used to identify the voltage sag source location. In particular, the study aims to identify the area or distance from the substation connected to the voltage sag source. The extracted features were the RMS-value, peak value, peak factor, the mean value of the voltage, and the substation’s distance. The first four features are obtained because the amplitude characteristics in the time domain are essential. Furthermore, the last feature was obtained to study what impact the power system topology had on voltage sags propagation.

(a) Randomly generated data. (b) Clustered data.

Figure 10: Example of k-means data clustering, where k = 2 [5].

The next step was to normalize the features. This is because the similarity or dissimilarity

is calculated through the clustering algorithm, meaning that each variable must exert the same

influence on the model to give a fair comparison [43]. When this was done, a cluster analysis

could be performed on the data. The first algorithm to be utilized was k-means. This algorithm

aims to identify similarities between data in a set around a k number of centroids or groups to

(31)

find similarities in data, as shown in figure 10. Since only the distances between data points and centroids are calculated, the algorithm is comparatively fast. However, for each run, the algorithm selects random cluster centers. That is why the results may not be consistent and sensitive to outliers, which are extreme values that diverge heavily from the other data [43] [44]. Furthermore, the k-value is manually defined, which may add to the inconsistency [43].

The second algorithm to be applied to the data was FCM. What makes this method unique is that data can belong to more than just one cluster alone. As a result, the FCM method is more effective in clustering data [7]. A primary difference is that FCM is based on fuzzy logic and an extension of k-means [43]. Because of this, the weaknesses observed in k-means are present in FCM too [44]. When the clustering was done, each cluster’s characteristics were noted for cluster validation and posterior analysis. Furthermore, the similarity patterns were highlighted from the input data in each cluster. By doing so, it was possible to see which cluster the fault source belonged to [43].

Additionally, a partial decision tree algorithm was applied to the data sets. What this does is that a rule is induced with each iteration until a decision tree (DT) is formed. The remaining data will build a new tree if a rule is induced until all data has been separated. In this case, each rule is based on the characteristics of each group. A simple decision tree is presented in figure 11. In the study, DT was applied to clusters so that each branch could represent a characteristic of a cluster.

By doing so, it was possible to see which cluster that aggregated the source location of the voltage sags [43].

Figure 11: A simple decision tree with binary leaf nodes [6].

The result showed that efficiency depended on the cluster number and the clustering algorithm that was used. That is why the study varied between 2,3,4,5 or 6 for the k-value. Furthermore, some of the results exceeded an accuracy greater than 90%, which was deemed satisfactory. When comparing the results, it was noted that the k-means algorithm had higher accuracy than FCM.

Furthermore, the algorithm was deemed optimal when the k-value was 4 or 5. This is because the algorithm had close hit rates and a small location region, which facilitated the source location [43].

Additional examples where k-means is used to identify location and source patterns for voltage

sags are presented in [45] and [46]. The three studies use similar k-means clustering approaches,

whereas additional insights are mentioned in the latter. For instance, in [46], it is recommended to

(32)

only use this method if there are plenty of smart meters measuring the grid’s values. If there are too few data samples, the accuracy is heavily reduced [45].

In [45], the grouping goal is to cluster voltage sags so that each cluster represents location and sag type according to their impact. If the number of clusters is too small, it corresponds to a few large zones where it is impossible to identify the voltage sag source location. However, if the number of clusters is too large, it means the zones are minimal, and thus the classification is inefficient [45]. Therefore, it is essential to find an appropriate amount of clusters. In [45], the clusters selected was k = 50. In [46], the algorithm was applied on a 20 kV substation in Indonesia, while [45] applied the algorithm on a high voltage system in Colombia.

2.3.2.4 DT and FCM for recognition of power quality issues

In [7], decision trees (DT) was used in a computational lab environment that mimics a real grid using a so-called Real-Time Digital Simulator, to identify power quality issues. This algorithm is combined with Stockwell’s transform or S-transform, and Fuzzy C-means (FCM) to detect, classify, and localize the PQ-issues. With the S-transform, it is possible to extract time-frequency features.

The features are then applied to a rule-based DT to classify the power quality issues further. This is done by starting a tree graph from a root node, whereas the final decision is contained in the leaf node. An example of this is demonstrated in figure 11. As a result, it is possible to find new relationships between input and output parameters [7].

When applied in the study, ten kinds of signals were simulated in MATLAB, where each would represent a power quality issue. This included a standard sine wave voltage for reference, voltage sags, and interruptions. Furthermore, the signals were sampled 64 times per cycle for a total of 10 cycles. The S-transform was then applied to the sampled signals to create an S-matrix. This is followed by applying time-frequency representation on the matrix, which extracted 14 features of choice. For instance, the features included the amplitude factor of the amplitude plot, the mean derived from the S-matrix, and the number of peaks in the S-transform based frequency-amplitude plot. The first eight extracted features were applied to design decision tree classification, while the following five features were used for the FCM algorithm. The last feature was used to localize voltage sags. Moreover, 20 dB noise was added to the signals as well to create a noisy environment [7].

When the extracted features were applied to the rule-based decision tree, the detected distur- bances are divided into two major groups, which depend on the number of peaks detected in the frequency-amplitude plot. In turn, these were grouped into sub-branches that spread out in various power quality disturbances using different logic. For instance, if a feature was equal to a specific value, the branch was divided until all data had been sorted into decision trees. A flowchart of the rule-based decision tree is presented in figure 12. Another study method was to apply the S- transform extracted features on decision tree initialized FCM to cluster the data instead. When applied in the study, the data were first grouped into the two major groups as in the rule-based DT.

These were then clustered based on its selected features, with six scatter plot combinations based

(33)

on, e.g., F1-F2 or F2-F3. In turn, each plot was given distinctive regions where the PQ-issues were separated and distinguished with high accuracy.

Figure 12: The flow chart shows the ruled decision tree algorithm, where Gx represents a group, Fx represents a feature, and Cx represents a PQ-issue [7].

Compared with ruled based decision trees, the FCM combination gave a more accurate result.

However, this particular algorithm has not been evaluated on hybrid power systems that include renewable energy sources. Therefore, it is uncertain whether the algorithm will perform as well in those scenarios [7]. Simultaneously, there are examples of when the decision tree performs within accepted limits by using S-transform, as shown in [47]. However, S-transform is challenging to apply to real-time offline monitoring because of the long computational time [47]. Furthermore, the algorithms of [7] and [47] have yet to be applied to real data.

2.3.2.5 Applicability of related methods to the study at hand

A few machine learning applications in the power system have been reviewed in previous sections.

These will be summarized and commented on in the following paragraphs.

In section 2.3.2.1, SVM and k-fold were applied for the source location of voltage sags in an

HV grid. In general, SVM appeared reasonably straightforward to apply on data, while k-fold

was useful to prevent overfitting the model. However, the study applied a binary classification on

whether the network’s fault happened upstream or downstream to a monitor. When identifying

weak grids, there are typically no clear cut distinctions to judge. Therefore, if this method should

be applied to the problem proposed in this project, it is unclear whether it is possible to get as

distinct answers.

(34)

Section 2.3.2.2 reviews how SVM is used to classify weak grids on LV grids by hosting capacity.

This example is relatively similar to what this project proposes to study, possibly implying that SVM might be a suitable method for the study as well. However, the main difference is that this project aims to identify weak grids, whereas the example analyzes the hosting capacity instead.

K-means, FCM, and DT are utilized to identify the area or distance from the substation con- nected to the voltage sag source 2.3.2.3. The data were first preprocessed using the clustering algorithms k-means and FCM, while the identification was made with DT. In the example, the pro- cessed data from k-means was demonstrated to return higher accuracies in the DT. For the methods to be implemented in this project, the smart meter events would need to be divided into different groups, where each rule in the DT is based on each group’s characteristics.

The last example in section 2.3.2.4 used DT and FCM for the recognition of power quality issues. However, this method mostly focused on classifying the disturbances into different events.

This was done by studying the network frequency and voltage levels. However, this particular work is already done automatically by Vattenfall Eldistribution AB’s smart meters. Since the example’s aim has significant differences compared to the one in this study, it might prove challenging to implement the same method.

3 Methodology

3.1 Choice of algorithms for the project

No existing algorithms to detect weak grids from smart meter data have been found from the lit- erature study. However, as presented in the previous section, several supervised and unsupervised machine learning algorithms have been identified and applied to power quality issues in various ways. The relationship between input and output data is unclear regarding the relationship between weak grids and PQ-events. Therefore, it is impossible to label the output data accordingly, mean- ing that an unsupervised machine learning algorithm is preferred. The literature study showed that k-means and FCM had been demonstrated to work to identify voltage sag source locations. As mentioned in section 2.2.2, voltage sags are a common power quality issue in LV grids that may additionally travel from the transmission grid and affect the sensitive load on the low voltage side.

That is why it could be interesting to start applying k-means or FCM to the smart meter data to study whether a logic may be found between voltage sags and weak grids.

However, it is also of interest to use supervised machine learning to evaluate how accurate existing methods are. Therefore, by using a classification algorithm, like SVM, it could be possible to label data as, for instance, either a "weak grid" or "not weak grid." Moreover, comparisons can be made between the different algorithms to compare their accuracy of identifying weak grids.

As mentioned in section 2.1, weak grids typically refer to grid areas sensitive to changes in

power production and consumption. This, in turn, may cause problems with voltage levels and

overload in the LV grid. Voltage problems and overloading are the most common issues on the LV

weak grid. The overall challenges in a weak grid concern voltage regulation, system stability, and

(35)

renewable power generation integration. The next phase of this study will shed more light on how these events on weak grids appear in real data and the frequency of such events.

3.2 Data extraction

The smart meter data to be analyzed was extracted from Vattenfall Eldistribution AB’s low-voltage monitoring system (LVM). It is in the LVM where power quality events from smart meters are registered and stored. However, before extracting data from LVM, it was necessary to identify the relevant electric substations and their underlying smart meters to be analyzed in this project.

Furthermore, when selecting different grid areas, grid topology and characteristics were taken into consideration. For instance, a variety of different LV grids were included to diversify the input data. These included the following:

• Grids with installed PV and those without PV.

• Grids in rural areas but also semi-urban areas.

• Grid areas with few customers and those with many customers

• LV grids with few branches and those with many branches

• Grid areas with different loop impedances.

• Analyze the smart meter data of the grid area from the year 2012 to 2019.

The data was exported from the LVM in xls-format and converted to xlsx-format to facilitate its integration into Spyder. Spyder is a development environment used for the programming language Python [48]. It is also in Python, where the algorithm to identify weak grids from smart meter data will be developed. By importing the smart meter data to Spyder, it was possible to create a data frame. Since all data from LVM is exported in the same format, it was possible to develop a standardized algorithm that specifically analyzed the smart meter data. The smart meter data refers explicitly to the power quality events registered in each smart meter and the characteristics of the events.

While the extracted data format is the same, there is room for customization to the exported

content. This mainly refers to the choice of event-type to be analyzed. As a start, the chosen events

were voltage sags (SAG), i.e., when the voltage level drops under 207 V, and voltage surges (SUR),

i.e., when the voltage level rises above 253 V. These are chosen due to their predominance among

the power quality events in Sweden. First off, a real substation and its underlying smart meter data

will be used for the algorithm development. The process will be further explained in detail in the

next sections.

Identifying Power Quality Issues in LV Distribution Grid by Using Data from Smart Meters: Exploring possibilities of machine learning algorithms

STOCKHOLM SWEDEN 2020 ,

Identifying Power Quality Issues in LV Distribution Grid by Using Data from Smart Meters

Exploring possibilities of machine learning algorithms

SAMANTHA CHEN

KTH ROYAL INSTITUTE OF TECHNOLOGY

Distribution Grid by Using Data from Smart Meters

Exploring possibilities of machine learning algorithms

Samantha Chen

A thesis presented for the degree of Master of Science

School of Electrical Engineering and Computer Science KTH Royal Institute of Technology

Stockholm, Sweden

September 3, 2020

His guidance and input were and are still much appreciated.

I also wish to pay my special regards to my examiner and professor Lars Nordström of the School of Electrical Engineering and Computer Science at KTH Royal Institute of Technology.

With his guidance and academic perspective, I was able to form a flexible framework for my project.

Last but not least, I wish to thank all the people from Vattenfall R&D and Eldistribution, who

have provided invaluable assistance during my study. It has been a milestone in the completion of

this project and a pleasure to work with everyone.

Since there is a significant potential to supervise the low voltage network with the assistance of the end-customer smart meters, Vattenfall Eldistribution AB wants to take advantage of such data.

Future work could include using the same machine learning methods on higher dimensions

input data to separate the data points. One way to diversify the data could be to include data

describing grid topology and data from PQ-meters. Furthermore, it will be possible to continuously

monitor the low voltage grid conditions with future smart meters. In turn, this may give a better

insight into how the voltage levels behave for weak and strong grids, respectively.

Vidare analyserades data för fyra olika fall per metod.

Vidare kommer framtida mätare ha möjlighet att övervaka nätets spänning och ström kontinuerligt.

Contents

1 Introduction 1

1.1 Background . . . . 1

1.2 Aim and goals . . . . 1

1.3 Delimitations and scope . . . . 2

1.4 Expected outcomes and results . . . . 2

2 Literature review 3 2.1 Definition of weak grids . . . . 3

2.2 Power quality in the grid . . . . 4

2.2.1 Definitions and existing regulation . . . . 4

2.2.2 Typical power quality issues . . . . 5

2.2.2.1 Transients . . . . 5

2.2.2.2 Short-duration RMS variations . . . . 7

2.2.2.3 Long-duration RMS variations . . . . 9

2.2.2.4 Voltage imbalance (unbalance or asymmetry) . . . . 10

2.2.2.5 Voltage fluctuations . . . . 10

2.3 Machine learning to identify power quality issues . . . . 11

2.3.1 Background of machine learning . . . . 11

2.3.2 Some known algorithm applications on power quality . . . . 13

2.3.2.1 SVM and k-fold for source location of voltage sags in an HV grid 14 2.3.2.2 SVM to classify weak grids on LV grids by hosting capacity . . . 15

2.3.2.3 K-means, FCM, and DT to locate voltage sag source . . . . 16

2.3.2.4 DT and FCM for recognition of power quality issues . . . . 18

2.3.2.5 Applicability of related methods to the study at hand . . . . 19

3 Methodology 20 3.1 Choice of algorithms for the project . . . . 20

3.2 Data extraction . . . . 21

3.3 Algorithm development . . . . 22

3.3.1 Initial statistical data analysis . . . . 22

3.3.2 Pre-processing data with sliding window algorithm . . . . 24

3.3.3 Unsupervised machine learning . . . . 25

3.3.3.1 K-means for clustering . . . . 25

3.3.4 Supervised machine learning . . . . 25

3.3.4.1 Support vector machine . . . . 25

4 Results 26 4.1 Unsupervised machine learning . . . . 26 4.1.1 First case using k-means to cluster raw data from the smart meters of Grid 1 26 4.1.2 Second case of identifying normal operation from the smart meters of Grid

1 using k-means . . . . 28 4.1.3 Third k-means case in regards to the number of incidents during normal

operation . . . . 34 4.1.4 Fourth case using loop impedance and event data from sliding window . . 46 4.2 Supervised machine learning . . . . 50 4.2.1 First case of support vector machine on data from Grid 1 . . . . 50 4.2.2 Second case of support vector machine on data per customer from Grid 1

and Grid 2 . . . . 52 4.2.3 Third case of support vector machine on data and loop impedance from

Grid 1 to Grid 9 . . . . 55 4.2.4 Fourth case of support vector machine with increased bins from event data

of Grid 1 to Grid 9 . . . . 59

5 Discussion 62

5.1 Analysis and interpretation of the results . . . . 62 5.2 Influence of the limitations on the results . . . . 65 5.3 Contributions and new insights . . . . 66

6 Conclusion 66

6.1 Conclusive summary . . . . 66 6.2 Future work . . . . 67

References 67

A.1 Initial statistical data analysis of Grid 1 I

A.2 Result of the first case of k-means IX

A.3 Results of the second case of k-means XIII

A.3.1 Finding the duration limits for normal condition . . . XIII A.3.2 The k-means result of the second iteration . . . XVII

A.4 Summary of every result using SVM XXV

List of Figures

1 A lightning stroke current that results in impulsive transients [1]. . . . 6

2 An example of back-to-back switching of a capacitor which in turn causes an os- cillatory transient [1]. . . . 6

3 Temporary voltage sag caused by motor starting operation [1]. . . . 7

4 Instantaneous voltage surge caused by SLG fault [1]. . . . 8

5 Momentary interruption due to fault and subsequent recloser operation [1]. . . . 8

the number of incidents for each event duration occurs at least X times. . . . . 58 38 3-Class classification of y _a on the event data per customer from Grid 1-9 during

7 The accuracy of SVM on Grid 1 for the third case using output y _a . . . . 56