Graph theory applications in the energy sector: From the perspective of electric utility companies

(1)

INOM

EXAMENSARBETE MASKINTEKNIK, AVANCERAD NIVÅ, 30 HP

STOCKHOLM SVERIGE 2020,

Graph theory applications in the energy sector

From the perspective of electric utility companies KRISTOFER ESPINOSA

TAM VU

(2)

Abstract

Graph theory is a mathematical study of objects and their pairwise relations, also known as nodes and edges. The birth of graph theory is often considered to take place in 1736 when Leonhard Euler tried to solve a problem involving seven bridges of Königsberg in Prussia. In more recent times, graphs has caught the attention of companies from many industries due to its power of modelling and analysing large networks.

This thesis investigates the usage of graph theory in the energy sector for a utility company, in particular Fortum whose activities consist of, but not limited to, production and distribution of electricity and heat. The output of the thesis is a wide overview of graph-theoretic concepts and their applications, as well as an evaluation of energy-related use-cases where some concepts are put into deeper analysis. The chosen use-case within the scope of this thesis is feature selection for electricity price forecasting. Feature selection is a process for reducing the number of features, also known as input variables, typically before a regression model is built to avoid overfitting and to increase model interpretability.

Five graph-based feature selection methods with different points of view are studied. Experiments are conducted on realistic data sets with many features to verify the legitimacy of the methods. One of the data sets is owned by Fortum and used for forecasting the electricity price, among other important quantities.

The obtained results look promising according to several evaluation metrics and can be used by Fortum as a support tool to develop prediction models. In general, a utility company can likely take advantage graph theory in many ways and add value to their business with enriched mathematical knowledge.

Keywords: graph theory, feature selection, energy industry

(3)

Sammanfattning

Grafteori är ett matematiskt område där objekt och deras parvisa relationer, även kända som noder respektive kanter, studeras. Grafteorins födsel anses ofta ha ägt rum år 1736 när Leonhard Euler försökte lösa ett problem som

involverade sju broar i Königsberg i Preussen. På senare tid har grafer fått uppmärksamhet från företag inom flera branscher på grund av dess kraft att modellera och analysera stora nätverk.

Detta arbete undersöker användningen av grafteori inom energisektorn för ett allmännyttigt företag, närmare bestämt Fortum, vars verksamhet består av, men inte är begränsad till, produktion och distribution av el och värme. Arbetet resulterar i en bred genomgång av grafteoretiska begrepp och deras tillämpningar inom både allmänna tekniska sammanhang och i synnerhet energisektorn, samt ett fallstudium där några begrepp sätts in i en djupare analys. Den valda fallstudien inom ramen för arbetet är variabelselektering för

elprisprognostisering. Variabelselektering är en process för att minska antalet ingångsvariabler, vilket vanligtvis genomförs innan en regressions- modell skapas för att undvika överanpassning och öka modellens tydbarhet.

Fem grafbaserade metoder för variabelselektering med olika ståndpunkter studeras. Experiment genomförs på realistiska datamängder med många

ingångsvariabler för att verifiera metodernas giltighet. En av datamängderna ägs av Fortum och används för att prognostisera elpriset, bland andra viktiga

kvantiteter. De erhållna resultaten ser lovande ut enligt flera utvärderingsmått och kan användas av Fortum som ett stödverktyg för att utveckla

prediktionsmodeller. I allmänhet kan ett energiföretag sannolikt dra fördel av grafteori på många sätt och skapa värde i sin affär med hjälp av berikad matematisk kunskap.

Nyckelord: grafteori, variabelselektering, energiindustri

(4)

Declaration

This study is conducted by students Kristofer Espinosa and Tam Vu, and

commissioned by Fortum Sverige AB.

(5)

Acknowledgements

We want to thank our supervisors from Fortum, Alexandra Bådenlid and Linda Marklund Ramstedt, as well as our manager and mentor, Hans Bjerhag, for their constant support and engagement. As well, we are thankful for the help we have received from our respective supervisors at KTH, Elena Malakhatka and

Xiaoming Hu.

(6)

1 Introduction 1

1.1 Purpose and research question . . . . 2

1.2 Research contribution . . . . 3

1.3 Limitations . . . . 3

1.4 Delimitations . . . . 3

1.5 Disposition . . . . 3

2 Market trends for utility companies 5 2.1 Uncertain growth in electricity demand . . . . 5

2.2 A more complex portfolio . . . . 6

2.3 Evolving technology . . . . 7

2.4 Evolving conditions in the energy markets . . . . 7

2.5 Customer trends . . . . 8

2.6 Digitalisation of utility companies . . . . 9

2.7 The emergence of graph analytics . . . 10

2.7.1 Relational databases . . . 11

2.7.2 Graph databases . . . 13

2.7.3 The relevance of graph analytics for utility companies . . . . 14

3 Methodology 15 3.1 Research design . . . 15

3.2 Research approach . . . 16

3.3 Research layout . . . 16

3.3.1 The preparation phase . . . 17

3.3.2 The idea generation and suggestion phases . . . 17

3.3.3 The evaluation phase . . . 21

3.3.4 The implementation phase . . . 24

3.4 Literature review . . . 26

(7)

3.5 Data collection . . . 27

3.6 Semi-structured interviews . . . 27

I Graph theory and applications 28 4 Elementary terminology 29 4.1 Graph and subgraph . . . 29

4.2 Graph traversal . . . 31

4.3 Trees and connectivity . . . 32

4.4 Matching . . . 34

4.5 Colouring . . . 34

4.6 Directed graph . . . 35

4.7 Weighted graph . . . 36

5 Selected applications of graphs 38 5.1 University timetabling . . . 38

5.2 Staff assignment . . . 39

5.3 Cost-effective railway building . . . 40

5.4 Logistic network and optimal routing . . . 41

5.5 Planar embedding of graphs . . . 42

5.6 Winter road maintenance . . . 43

5.7 Social network . . . 44

5.8 Tournament ranking system . . . 46

5.9 Image segmentation . . . 47

Summary 49 II Selection of use-case 50 6 Idea generation 51 6.1 Inventory . . . 51

6.1.1 Clustering . . . 52

6.1.2 Heat map . . . 52

7 Assessment of use-case clusters 54 7.1 Preliminaries . . . 54

7.1.1 Presentation of the use-case clusters . . . 54

(8)

7.1.2 Scoring system . . . 55

7.1.3 Assessment results per criterion . . . 55

7.2 Optimisation of hydropower operations . . . 57

7.3 Hydropower operation and maintenance . . . 60

7.4 Operation and maintenance for nuclear power . . . 63

7.5 Design and maintenance for wind power . . . 66

7.6 Electric vehicle applications . . . 69

7.7 Market intelligence for energy trading . . . 72

7.8 Storage solutions for the distribution grid . . . 75

7.9 Master data management . . . 78

7.10 Knowledge graphs . . . 81

7.11 Results of use-case cluster assessment . . . 83

7.11.1 Assessment results . . . 83

7.11.2 A note on Energy trading . . . 85

7.12 Discussion on the selection of use-case cluster . . . 87

8 Assessment of use-cases in Energy trading 88 8.1 Natural gas market analysis with visibility graphs . . . 90

8.2 Smart meter clustering for short-term load forecasting . . . 92

8.3 Feature selection for electricity price forecasting . . . 95

8.4 Results of the use-case assessment . . . 98

8.4.1 A note on Electricity price forecasting . . . 98

8.5 Discussion on the selection of use-case . . . 100

9 Conclusion and discussion on PART II 101 9.1 Conclusion . . . 101

9.2 Discussion . . . 102

III Graphs, feature selection and electricity price forecasting 104 10 Graph theory in electricity price forecasting 105 10.1 Background on the electricity market . . . 105

10.1.1 Roles in the market . . . 105

10.1.2 The day-ahead spot-market . . . 106

10.1.3 The balancing market . . . 107

10.2 Electricity price forecasting . . . 108

(9)

10.2.1 Forecasting methods . . . 108

10.2.2 Feature selection in electricity price forecasting . . . 111

10.2.3 The case of intra-day . . . 113

10.2.4 Summary . . . 114

10.3 Motivations on the proposed methodology . . . 115

11 Introduction to feature selection 116 11.1 Background . . . 116

11.2 Purpose and outline . . . 117

12 Preliminaries 118 12.1 Laplacian matrix . . . 118

12.2 Nearest neighbour graph . . . 119

12.3 Graph clustering . . . 120

12.4 Comparison of two clusterings . . . 120

12.4.1 Clustering accuracy . . . 121

12.4.2 Normalised mutual information . . . 121

12.4.3 Adjusted mutual information . . . 122

12.5 Similarity comparison of two sets . . . 122

13 Feature selection methods 123 13.1 Laplacian score (LS) . . . 125

13.1.1 Preparation . . . 125

13.1.2 Optimisation problem . . . 125

13.1.3 Feature selection algorithm . . . 127

13.2 Multi-cluster feature selection (MCFS) . . . 127

13.2.1 Preparation . . . 128

13.2.2 Optimisation problem . . . 128

13.2.3 Feature selection algorithm . . . 129

13.3 Non-negative discriminative feature selection (NDFS) . . . 129

13.3.1 Preparation . . . 130

13.3.2 Optimisation problem . . . 130

13.3.3 Feature selection algorithm . . . 133

13.4 Feature selection via non-negative spectral analysis and redundancy control (NSCR) . . . 134

13.4.1 Optimisation problem . . . 134

13.4.2 Feature selection algorithm . . . 136

(10)

13.5 Feature selection via adaptive similarity learning and subspace

clustering (SCFS) . . . 137

13.5.1 Optimisation problem . . . 137

13.5.2 Feature selection algorithm . . . 137

14 Experiments and results 139 14.1 Line of action . . . 139

14.2 Data sets . . . 140

14.3 Parameter setting . . . 140

14.4 Experiment: Eight public data sets . . . 141

14.4.1 Clustering accuracy . . . 141

14.4.2 Normalised and adjusted mutual information . . . 141

14.4.3 Stability . . . 142

14.5 Experiment: CELEBI of Fortum . . . 142

15 Discussion 144 15.1 Convergence speed . . . 144

15.2 Parameter sensitivity . . . 145

15.3 Jaccard index . . . 145

15.4 Conceivable challenges . . . 146

15.5 Implications for forecasting activities . . . 146

15.6 Scalability . . . 147

16 Research conclusion 148

Bibliography 150

Appendices 176

A Part I & II: Figures and tables

A.1 List of interviews . . . .

A.2 Inventory of use-cases . . . .

A.3 Evaluation tool . . . .

A.3.1 Graph applicability . . . .

A.3.2 Technical feasibility . . . .

A.3.3 Economic potential . . . .

A.3.4 Workability . . . .

B Part III: Figures and tables

(11)

B.1 Experiments and results . . . .

B.2 Discussion . . . .

C Glossary

(12)

List of Figures

2.1 European Net electricity generation, EU-28, 1990-2017. Source:

Eurostat 2019 . . . . 6

2.2 Evolution of the estimated impact of technology on utility companies between 2015 and 2018. Source: Monitor Deloitte (2018) 9 2.3 The economic impacts of digitalisation on utility earnings Source: McKinsey (2016) [4] . . . 10

2.4 A friendship directed social graph. Source: AWS (2020) . . . 14

3.1 The idea generation phase . . . 21

3.2 The idea selection phase . . . 25

4.1 Examples of graphs. . . 30

4.2 Various kinds of walks in a graph. . . 32

4.3 Examples of graphs which are not trees. . . 33

4.4 Differently configured trees with 5 vertices and 4 edges each. . . 33

4.5 Examples of spanning trees in a connected graph. . . 33

4.6 Examples of matchings. . . 35

4.7 Examples of vertex colourings using as few colours as possible. . . . 36

4.8 Weighted directed graph and a small fictitious kingdom far, far away. 37 5.1 The complete bipartite graph K

3,3

is not planar. . . 42

5.2 Graph vertices can be most central in different aspects. . . 45

6.1 Heat map of past use-cases per cluster and per graph-related concept. 53 7.1 Hydropower optimisation . . . 59

7.2 Assessment of Hydropower: Operation and maintenance . . . 63

7.3 Assessment of Nuclear power operation and maintenance . . . 65

7.4 Assessment of Wind power operation and maintenance . . . 68

(13)

LIST OF FIGURES

7.5 Assessment of EV Applications . . . 71

7.6 Assessment of Energy trading . . . 75

7.7 Assessment of Energy storage solutions . . . 78

7.8 Assessment of Master Data Management . . . 81

7.9 Assessment of Knowledge graphs . . . 83

7.10 Graph-score vs strategy-score of each use-case cluster . . . 84

8.1 Assessment of Natural gas market analysis with visibility graphs . . 92

8.2 Assessment of Short-term load forecasting - clustering . . . 95

8.3 Assessment Electricity price forecasting - feature selection . . . 98

8.4 Comparison of the use-case assessments . . . 99

10.1 Formation of electricity prices . . . 107

14.1 Average Jaccard index. . . 143 B.1 Jaccard index for eight different data sets. . . . B.2 Relative change of the objective value in iterative methods. . . . B.3 Clustering quality for different α and β, obtained with the features

selected using SCFS. . . .

(14)

List of Tables

3.1 Evaluation dimensions and criteria. . . 22

5.1 The centrality measures of the vertices in Figure 5.2. . . 46

6.1 Number of past use-cases per cluster . . . 52

7.1 Scoring of use-case clusters . . . 56

8.1 Scoring of use-cases in Energy trading . . . 89

13.1 Notations associated with a given data set. . . 124

14.1 Data sets with their numbers of samples (m) and features (n). . . . 140 A.1 List of interviewees. . . . A.2 Use-cases in Hydropower optimisation . . . . A.3 Use-cases in Hydropower operation and maintenance . . . . A.4 Use-cases in Nuclear power operation and maintenance . . . . A.5 Use-cases in Wind power design, operation and maintenance. . . . . A.6 Use-cases for Electric vehicle applications . . . . A.7 Use-cases in Energy trading. . . . A.8 Use-case in Energy storage . . . . B.1 Clustering accuracy (ACC) [%] corresponding to different data sets

and feature selection methods. . . . B.2 Normalised mutual information (NMI) [%] corresponding to

different data sets and feature selection methods. . . . B.3 Adjusted mutual information (AMI) [%] corresponding to different

data sets and feature selection methods. . . .

(15)

LIST OF TABLES

B.4 Clustering quality measures for feature selection of CELEBI. The

first column shows the indices of the subsets. . . .

(16)

CHAPTER 1 Introduction

Graph theory is a mathematical study of objects and their pairwise relations, even known as nodes and edges respectively. The birth of graph theory is often considered to take place in 1736 when the Swiss mathematician Leonhard Euler tried to solve a routing problem about seven bridges of Königsberg in Prussia. In more recent times, the increase in data and computing power has given rise to computational intelligence modelling and perhaps more discretely has graph theory been applied to several services at the foundation of a digital society.

Google’s PageRank search algorithm and their map are based on graphs and so are Facebook’s and Twitter’s social networks. Whether it is a link from a website pointing to another or adding someone as a friend, the relations (edges) bear the fundamental information. These applications of graph theory in the digital sphere, generally with large amounts of data, are referred to as graph analytics.

Graph analytics have the advantage of being fast analysis tools and scalable to exceptionally large networks. Thus, it has caught the attention of companies from all types of industries and is commonly being used as a modelling tool designed for networks.

Societies are currently undergoing a transition toward a low-carbon energy system. The main drivers of this change, electric utility companies, are

digitalising and innovating on new products to stay or become more competitive.

The liberalisation of the power markets and the increasing level of renewable

sources of electricity fed into the grid have brought perspectives of diminishing

electricity prices in many countries, complicating the situation for the electricity

(17)

1.1. PURPOSE AND RESEARCH QUESTION

generators. On the other hand, the technological advancements with respect to the energy and information technology sectors provide new opportunities for utilities to optimise their operations or find new revenues streams. In this regard, graph theory appears as a potentially helpful technology to support such

endeavours. It has been used in the contexts of Internet of Things (IoT), for routing, fraud detection, customer analysis, advanced search, scheduling and many more. Less publicised are the set of useful applications specific to the power market. As such, similarly as to machine learning or blockchain, utilities have an interest in getting deeper insights in the technology to assess where and how it can be used to their benefit.

1.1 Purpose and research question

This paper aims to give the interested reader guidelines as to where graph theory is applicable in the energy sector and more particularly in the power market. A thorough background of graph theory and its theoretical applications is followed by a comprehensive presentation of previous case studies revolving around the operational markets of a typical utility company, giving an indication of potential applications. These applications, also called use-cases, are evaluated to indicate where and how they could be beneficial. Finally, a use-case is conceptualised and a proof-of-concept realised.

More specifically, we will treat the following questions:

◦ What are the main benefits of graph theory in engineering and science applications?

◦ How can graph theory be useful in the energy sector and more particularly within the context of an electric utility company?

◦ Given a use-case relevant for the business of Fortum, what can a graph-based model look like and what algorithms can be used to solve problems?

Relevant theoretical backgrounds will be covered for a deeper understanding of

the topics in question, namely graph theoretical and energy-specific ones.

(18)

1.2. RESEARCH CONTRIBUTION

1.2 Research contribution

This research aims to bring contributions in the following ways:

◦ An overview of applications of graph theory within the energy sector with a detailed theoretical background for pedagogical purposes, which to our knowledge does not exist to date.

◦ A generic evaluation tool for a face-value assessment of graph-theory potential for commercial applications for practitioners in any industry.

◦ A review of graph-based feature selection methods and their application to a commercial use-case.

1.3 Limitations

In this study, a trade-off is necessary between the number of applications

evaluated and the amount of details provided in each. Each application covered is in fact subject for a study in itself. Consequently, the assessments are to be considered at face-value, as a guideline for practitioners to where attention is to be directed.

1.4 Delimitations

This study was commissioned by Fortum. Hence, the verticals and geographies of the energy sector considered for the applications are aligned with the operations of Fortum, with the exception of some of the more corporate activities, including strategic and financial decision-making as well as regulatory and compliance activities. The reason for this is to contain the research mainly to energy industry-specific challenges.

1.5 Disposition

This paper comprises three parts.

Part I contains a wide overview with elementary concepts of graph theory and

selected applications with support of recent researches. This part aims to provide

(19)

1.5. DISPOSITION

an inspiring mathematical background for practitioners needing mathematical foundations for implementing graph analytics applications.

Part II details the assessment of the identified use-cases of graph theory in the energy sector. The evaluation tool is explained in further detail and a

comparison between use-cases is made, in order to find a relevant application for a proof-of-concept as well as guiding decision-making for which areas to focus on for future proofs-of-concept.

Part III focuses on a specific area relevant for Fortum to bring the general concepts into deeper analyses. Within the scope of this thesis, the case study is about feature selection for electricity price forecasting using graph-based

methods. Five methods with different points of view are presented with main

idea, derivation, algorithm and quality validation by experiments on real-world

data.

(20)

CHAPTER 2 Market trends for utility companies

Utility companies have a central role in facilitating the coming transition to a sustainable energy system. Active across the value chain from power generation to its delivery to the end-user, they face increasing political, economic and social pressure to decarbonise the power sector and scale the integration of renewable sources of energy. However, given the scale of the economic, technological and organisational challenges to this, some argue that this evolution can in fact be likened more to a transformation. Below are some fundamental trends affecting the power sector that utilities need to account for when elaborating their strategies.

2.1 Uncertain growth in electricity demand

Historically, utility companies have been able to invest in new power generating assets thanks to an ever-growing demand in electricity. However, utilities in most developed economies have been seeing stagnating or even declining demands in electricity, mainly due to the fact that the electrification rate already is so high.

Other reasons for declining electricity demand are the deindustrialisation of

developed economies and the energy efficiency improvements, notably in the

residential sector. For instance, in New York, the electricity demand is projected

to grow at an average annual rate of 0.16 % through 2024. For the case of

(21)

2.2. A MORE COMPLEX PORTFOLIO

Europe, the figure below summarises the evolution of electricity consumption since 1990 [1].

Figure 2.1: European Net electricity generation, EU-28, 1990-2017.

Source: Eurostat 2019

Despite this recent trend, the electrification of the residential and transportation sector is adding uncertainty to the future electricity demand projections,

particularly in terms of peak demand, potentially driving up the need for the total power system capacity, yet without necessarily increasing the demand [13].

2.2 A more complex portfolio

Renewable energy generation differs from conventional power sources in some fundamental ways, which alter the companies’ operations all along the asset life cycle, from the investment and the operation and maintenance to the

decommissioning. As solar and wind are the growth drivers among the distributed energy resources (DERs), focus will be put on them.

Different asset types

As opposed to conventional power sources, DERs have on average significantly lower capacities and are generally connected to the distribution grid (low to medium voltage) and are sparsely located across geographical regions. As such, their design, placement and sizing are not only subject to the electricity grid and pricing signals, but also vary largely based on topology, irradiation, wind maps and land ownership. To be competitive, utilities now need to reach the same level of expertise on these matters as they have accumulated over time regarding conventional power sources [6]

Evolving needs in operation and maintenance

(22)

2.3. EVOLVING TECHNOLOGY

The intermittent nature of certain DERs such as wind and solar causes their operation to be different than conventional power sources. The uncertainty of the weather conditions adds complexity to the day-ahead bidding strategy of

utilities, notably in terms of residual demand of the energy system. Not only is their own power output more unpredictable, the output of competitors is too, causing higher margins of error induced by inaccurate weather modelling [14].

As wind power plants and solar farms are more distributed, complexity is also added on the maintenance-side. The portfolio of the utilities now encompasses higher distributions in terms of asset location, ages and life cycles. Even though OEMs oftentimes provide warranties for the wind and solar power plants, utilities are incentivised to create their own intelligence on these matters to attain a competitive edge.

2.3 Evolving technology

Whereas conventional sources such as hydropower and nuclear are more mature technologies, the technology in wind and solar, as well as energy storage, is rapidly evolving. This complicates the investment decision, not only in terms of the right timing to commission a new renewable power plant, but also in their design (selection of equipment, addition of energy storage).

2.4 Evolving conditions in the energy markets

Statkraft projects solar power to become the largest source of power generation on a global basis and cover almost 30 % of all electricity generation in 2040, with wind power covering 20 % [7]. This has two main effects on the economics of the market: declining electricity prices during peak production hours and increasing need for grid stability services [2].

As renewables of the same type generate power simultaneously within the same geographical area, a higher proportion of renewables is fed into the system, an increase in supply of free generation drives down prices, sometimes even leading to negative prices. This can have a notable impact on the investment decisions in renewable generation, as adding renewable capacity hampers the profitability from the installed renewable generation [6].

The stability of the grid is impacted by the increasing share of intermittent

(23)

2.5. CUSTOMER TRENDS

power generation, combined with the reduction in the system inertia provided by heavier rotating equipment. Thus, increasing attention is channelled towards providing ancillary services for the grid operator on the intra-day market as new revenue sources for asset utilities [2].

On top of this, the energy commodities market, particularly coal, oil and gas, is becoming all the more volatile and financialised, subject to recurring global political and economic crises. In particular, natural gas is increasingly decoupled from oil prices and subject to more fundamental forces. This puts pressure on higher forecasting needs, whether or not a utility uses coal or gas as fuel, impacting both their input price for electricity generation and their electricity price projections [8].

2.5 Customer trends

There is a growing engagement from the customers to participate in the power sector in various ways. First of all, the lower barriers for investment in solar power has attracted many private households across the globe to invest in their own power production and net metering schemes have been vastly deployed [12].

Secondly, there is a growing attention put on demand response programmes, allowing customers to adapt their consumption in response to electricity price signals. This is to a large extent enabled by the roll-out of smart metering

systems, allowing for a timely, two-way communication between the end-user and the central system [11].

Utilities are thus facing a growing interest from customers for higher energy independence and more elaborate and efficient energy solutions. This forces them to innovate in terms of product offering in order to reduce churn or grid defection (i.e. “going off the grid”). There is a risk for the situation to evolve into what is being called the “death spiral”, whereby customers going off the grid further increases grid defection, because the network costs are borne by fewer customers.

However, this trend is to be taken with a grain of salt, as it has not yet been observed on a large scale and could be contained to some specific geographical regions where the conditions are met (e.g. high solar irradiation) [5].

Despite all this, electricity is still seen as commodity for end-users and price

elasticity is low [9]. The interest and understanding of the energy system from

the customer’s side remains low for most people, which makes the

(24)

2.6. DIGITALISATION OF UTILITY COMPANIES

competitiveness of a utility dependent on product pricing rather than on product differentiation, hampering the efforts of utilities to engage them and innovate rapidly [6].

The aforementioned conditions puts pressure on utilities to adapt their operations and competences, reimagine their role in the system through technological and business model innovations, while needing to optimise their asset fleet in various ways. Being industry incumbents, utilities have a strong inertia, this change is both a technical, economic and organisational challenge, requiring timely decisions and a balanced explore-exploit trade-off [10].

As can be seen in the figure below, the timing and impact of underlying energy industry trends for utilities is difficult to assess [3].

Figure 2.2: Evolution of the estimated impact of technology on utility companies between 2015 and 2018.

Source: Monitor Deloitte (2018)

2.6 Digitalisation of utility companies

New digital tools can prove useful in supporting utility companies in tackling

some of the challenges mentioned above. Managing and utilising the data

generated from the different components of the electrical system represent great

(25)

2.7. THE EMERGENCE OF GRAPH ANALYTICS

opportunities for utility companies to respond to the evolving environment they operate in. As seen in the figure below, digitalisation can have a positive effect all along the electricity value chain [4].

Figure 2.3: The economic impacts of digitalisation on utility earnings Source: McKinsey (2016) [4]

Utility companies have already started to invest heavily in digitilisation,

particularly enabled by the emergence of Internet of Things and the possibility to process big data with increased computing power and Artificial Intelligence (AI).

Since 2014, the investments in digital electricity infrastructure and software have seen an annual growth rate at above 20 %, reaching US$ 47 billion in 2016. By increasing connectivity as well as modelling and monitoring capabilities,

digitilisation is expected to decrease generation costs with up to 10 % to 20 % in the oil and gas industry, and 5 % in the power sector and help integrate more intermittent renewable generation. On the consumption side, digitilisation could cut energy use by about 10 % in buildings, help reshape the mobility sector, facilitate "smart demand response" programmes as well as smart charging technologies for electric vehicles [15].

2.7 The emergence of graph analytics

According to Gartner, digitalisation "is the use of digital technologies to change a

business model and provide new revenue and value-producing opportunities".

(26)

2.7. THE EMERGENCE OF GRAPH ANALYTICS

Digitilisation is enabled by digitisation, which is the mere process "changing from analog to digital form" [16]. The traditional way of digitising real-world processes is through tabular databases, where elements from the real world are stored in tables. Increasingly, however, graph databases are used to better capture the relationships between the digitised elements in a network. The latter type of database is a stepping stone to be able to garner the power from graph theory. A quick look at the different forms of databases can be helpful in order to

understand how graph theory can become a useful tool of analysis for organisations undergoing digitilisation.

2.7.1 Relational databases

Tabular, or relational, databases store information in tables. Each table represents data elements, typically called a model, where the columns list the attributes of the model and where each row is an instance of the model, identified by a unique ID.

Example: A person model

There are three types of relationships between models: 1-to-1, 1-to-many and many-to-many.

1-to-1 relationships describe a relationship between A and B in which one element of A may only be linked to one element of B and vice versa. A country and its capital city have a 1-to-1 relationship.

1-to-many relationships are a parent-child type of relationship, where A can have several instances of B but B can only have one instance of A. We say that B belongs to A. An example of this is an organisation which has several

employees. In this case, common practice is to assign an organisation foreign key to an instance in the employee table.

Many-to-many relationships are where A can be connected to several instances of B and vice versa. This can be exemplified by the relationship

between a doctor and a patient: A doctor can have several patients and a patient can have several doctors. A solution to store this information is to connect them with a common table, for example booking, the instanced of which contain both a doctor and a patient foreign key.

Query language

(27)

2.7. THE EMERGENCE OF GRAPH ANALYTICS

SQL has remained a consistently popular choice for database administrators over the years primarily due to its ease of use and the highly effective manner in which it queries, manipulates, aggregates data and performs a wide range of other functions to turn massive collections of structured data into usable information.

Benefits and drawbacks

Tabular databases have however two inherent drawbacks. The first one is that relationships can only be described by foreign keys and as such are

descriptionless. This entails that the relationship cannot be described in full.

Another drawback of tabular databases is the complexity of traversing the database. If a connection needs to be found between tables which are indirectly linked, the user first needs to join all tables based on the foreign keys.

A basic example can help illustrate this. To find students in a given geographical continent, the database manager would have to join the tables Student,

University, City, Country, Continent on their respective foreign keys. The query would look like below:

SELECT * FROM students

JOIN universities ON universities.id = students.university_id JOIN cities ON cities.id = universities.city_id

JOIN countries ON countries.id = cities.country_id

JOIN continents ON continents.id = countries.continent_id WHERE continents.name = "Europe"

This query has two implications:

◦ The complexity of the queries can quickly increase as the distance between A and B increases.

◦ The complexity of the calculation for the server can also increase

dramatically as the size of the database or the distance between A and B increases, slowing down the response time and hence the productivity of the organisation.

SQL is not the best choice for all database applications. For one thing, while

SQL had been effective at data scales up through the 1990s and beyond, it

started to falter at the hyperscale levels at the turn of the century. Some users

also complain of its sharding limitations hampering the ability to break large

databases into smaller, more manageable ones.

(28)

2.7. THE EMERGENCE OF GRAPH ANALYTICS

These drawbacks are what led to the creation of NoSQL and the more recent NewSQL, which attempt to enhance the traditional SQL’s scalability without sacrificing its inherent atomicity, consistency, isolation and durability (ACID), critical components of stable databases.

2.7.2 Graph databases

Graph databases differ from relational databases in that they give as much importance on the relationship between elements (edges) as on the elements themselves (nodes). They are particularly useful in modelling networks of highly connected components. As with relational databases, the entities can be

described with a set of attributes, most often described through a dictionary data-structure (key-value pairs), often stored as JSON objects. The edges of the graph are stored in the same way as the nodes, described by key-value pairs.

A graph database can be illustrated by the classical example of social graphs.

The nodes in this example are people, who might have a set of attributes (e.g.

first name, last name). The edges represent the friendships between people and can also contain more information (e.g. date of friendship established, strength of the bond). The ability of adding information on the relationship is a cornerstone of modern social media applications such as Facebook, Twitter, Instagram and more and is the foundation to the algorithms aimed at personalising the content for the end-users (e.g. promoted posts).

An interesting nuance between Facebook and Instagram lies in the directness or not of the relationships (followers vs. friends). This can easily be modelled by the simple choice of applying directed or undirected edges connecting nodes.

Figure 2.4 is an example of a directed social graph.

On top of the potential improvements in query response times and the higher

level of intuition caused by the choice of a graph-database for modelling more or

less complex networks, a large set of algorithms and analysis tools from the field

of graph theory are made available for analytics purposes. Determining the

density of the network, identifying influential nodes, assessing the similarity or

interdependence of nodes or visualising the propagation of some phenomena

through a network are all facilitated by graph-related concepts such as clustering,

centrality, connectivity, label propagation, link prediction, etc.

(29)

2.7. THE EMERGENCE OF GRAPH ANALYTICS

Figure 2.4: A friendship directed social graph.

Source: AWS (2020)

2.7.3 The relevance of graph analytics for utility companies

As companies digitalise their assets and operations, the question of the modelling approach arises. As was briefly discussed in the previous section, alternatives to the traditional tabular databases are available, among others graph-databases.

Given the fundamental trend of society toward higher levels of connectivity, be it for physical objects via the roll-out of sensors, for people and companies or energy commodity prices, a graphical approach can prove useful.

This very observation motivates the incentive for electric utility companies to

gain a deeper understanding of the foundations of the mathematical field of

graph theory to better harness the potential of graph analytics.

(30)

CHAPTER 3 Methodology

3.1 Research design

Management research method literature generally classifies research purpose in three types: exploratory, descriptive or explanatory. Depending on its purpose, research can belong to one or more of the research types and can even include them sequentially as the research evolves [17]. The objectives of the respective approaches are:

◦ Exploratory: the problem is loosely defined and the area potentially

unexplored and the researcher needs to gain a clearer understanding of the topic.

◦ Descriptive: an observed phenomena, person or situation needs to be analysed and described objectively, generally without inferring conclusions.

◦ Explanatory: a situation or a problem is observed and a causal relationship between the variables is necessary.

An exploratory research was deemed fit for the purpose of this study, considering

the need to deeply understand the domains of graph theory and of the energy

sector operations and the novelty of the work. This research was carried out in

accordance with exploratory research praxis, both with respect to the research

approach and the information gathering. Indeed, as Saunders et al., point out,

exploratory research needs to be dynamic and flexible to changes, as new data

(31)

3.2. RESEARCH APPROACH

and new insights can alter the direction of the study [17]. In general, the research starts with a broad transversal perspective and narrows down as insights are gained. Regarding information gathering, common approaches include literature review, interviews with subject experts and the conducting of focus groups.

3.2 Research approach

Research can be carried out either inductively, deductively or abductively [17].

◦ Inductive: the research starts with observations and theories are proposed as a result of the observations, toward the end of the research process [18].

◦ Deductive: this approach is concerned with "developing a hypothesis (or hypotheses) based on existing theory and then designing a research strategy to test the hypothesis" [19].

◦ Abductive: the research process is devoted to finding the best possible explanation to "surprising" observations from a range of possible theories, reaching a plausible yet not necessarily universally true conclusion [20].

This research adopts a hybrid inductive-deductive approach. Firstly, an inductive approach is taken as there is no hypothesis to verify. The aim is to discover the potential benefits of implementing graph-based solutions by a deeper

understanding of graphs and perceived problems from the industry. In fact, hypotheses are being created as use-cases are elaborated. The retained

hypothesis is the one from the selected use-case, namely that it can create value to practitioners. This hypothesis is verified in the subsequent part of the

research, in a qualitative and quantitative manner. Thus, this latter phase takes a deductive approach.

3.3 Research layout

Finding applications of graph theory in one’s sector is essentially an Innovation

Management endeavour. Innovation Management is a set of processes companies

implement to continuously introduce new technological solutions, products and

services to their markets. The processes typically revolves around generating new

ideas and solutions, which are then evaluated, prioritised, tested and finally

implemented and rolled out [21].

(32)

3.3. RESEARCH LAYOUT

This research focuses on the first steps of this process, that is, from the

generation of new ideas to the development of a proof-of-concept for a selected idea. Therefore, the more narrow field of research of idea management was taken inspiration from for the research layout. Idea Management (IM) is a sub-process of innovation management, aiming at structuring and streamlining the idea generation, evaluation and selection processes [32]. It is also referred to as the

“front-end” of innovation, as the ideas are generally generated by the employees themselves to address their specific problems and needs. The systematic

approach adopted in IM is being implemented by many large firms to cope with an otherwise “fuzzy” nature of the front-end of innovation management, bearing a high level of informality and uncertainty [23, 24].

The research layout is explained in more details below. It followed the main steps proposed by Gerlach and Brem in their guidelines for IM practitioners [25], namely the preparation phase, the idea generation and suggestion phases, the evaluation phase and the implementation phase.

3.3.1 The preparation phase

This phase defines the overall objective and scope of the idea management programme. An idea manager formulates the ideation strategy and plans how to generate, improve and evaluate ideas. The defined rules of the preparation phase can be seen as the first of various filters toward a potential commercialisation [27].

Among the problem types defined by Gerlach and Brem in the IM process, this research addresses “a new technology looking for a new application”, in this case, graph theory [25]. Therefore, the preparation phase entailed an extensive

background research on graph theoretical concepts and applications, as well as the design of a nuanced approach from traditional IM praxis for the idea generation and selection phase.

3.3.2 The idea generation and suggestion phases

This phase is usually of distributed nature, insofar as people from the

organisation submit and present their ideas. A critical factor to consider is the

reward system to engage employees for submission. Due to the novelty of the

technology constraint (i.e. the use of graph theory) and the lack of reward

(33)

3.3. RESEARCH LAYOUT

possibilities, this research proposes a data-driven approach for the idea generation phase. The interviews conducted helped identifying critical problems in each sector of activity of the firm, seeding the search for past use-cases of graph-based solutions and contributing to assess the relevance of these to the organisation.

A vast inventory of use-cases was compiled across the core activities at the utility company, spanning all possible types of graph-theoretic solutions, both from academia and the industry. In parallel, interviews were conducted to understand the challenges and opportunities in the various sectors Fortum operates in, providing deeper insights in the relevancy of the inventoried use-cases and their implications. The use-cases were clustered into areas of applications and filtered by relevancy to the company’s value chain as part of the “idea improvement”

process suggested by several authors in the IM literature [23, 28, 29].

The approach to generate application ideas is inspired from the concepts of

“market pull” and “technology push” [38]. In the “market pull”, the source of the innovation is an unmet need of the customer (in this case, it could be both the end-customer and the end-user of the solution). This results in new demands for problem-solving (‘invent-to-order’ a product for a certain need). The impulse comes from individuals or groups who (are willing to) articulate their subjective demands.

In the “technology push”, the stimulus for new products and processes comes from (internal or external) research; the goal is to make commercial use of new know-how. The impulse is caused by the application push of a technical

capability. Therefore, it does not matter if a certain demand already exists or not.

In this research, the pull approach consists in identifying the needs in the respective sectors, whereas the push approach considers the benefits and

drawbacks of the technology itself and the solution it has permitted in academia and various industries.

Technology push

The push approach consists in identifying classic, verified use-cases of the

technology in order to understand the inherent benefits of the technology in

practice. This inventory of use-case domains provides an important benchmark

for tying the utility company’s inventory of areas of improvements to graph

analytics. The two sources of benchmark use-cases are academia and the

(34)

3.3. RESEARCH LAYOUT

industry.

Academia

Graph theory is a mature and vastly studied mathematical area. A Scopus search for “graph theory” yields about 100,000 results. As a comparison, a

“blockchain”-search results in 15,000 hits. In addition, their high degree of adaptability across engineering tasks, thus potentially across a utility company’s operations.

Therefore, a systematic, programmatic search for research articles relating graph theory and energy activities was performed in order to guarantee a complete scanning for potential applications. The first phase consisted in searching a combination of keywords representing graph theory: “graph theory”,

“graph-based”, “network theory”, in combination with each energy sector the company is operating in (e.g. hydropower, nuclear power, wind power, etc.).

The second phase consisted in listing graph-related concepts and algorithms (about 70) and identifying their presence in academia across the energy sectors.

Due to the higher dimensionality of this search (70 × 15 entries), the search was done programmatically with the Scopus API (Application Programming

Interface), on TITLE-ABSTRACT-KEYWORDS. In total, about 3,000 articles were found, stored, curated first automatically according to various filters and then with the help of the abstracts.

Some interesting articles could be found thanks to the higher specificity of the graph-theoretic concepts used in the search, graph theory being a rather wide area in mathematics, sometimes making it too vague to be present in the abstract or in the keywords on Scopus.

Industry

As described in the background, graph-based solutions have had an increasing interest from the industry, particularly in highly digitalised companies.

Benchmark use cases from industry were mainly extracted from classical

use-cases published online, as well as publicised case studies from graph-database providers (Neo4j, Expero). The industry benchmark use-cases having a lower a priori relevance level to the specific energy industry and are less detailed in their implementation than are the examples from academia but give a complementary view of what is being worked on in a more practical manner.

Demand pull

(35)

3.3. RESEARCH LAYOUT

The pull approach enables a technologically agnostic approach on solving the company’s needs, taking user-centric approach, rather than a technology-centric approach, helping to avoid shoehorning a technology to solve a problem. In this process, a mapping of the activities of a company is made and an inventory of pain-points faced by the company is made.

The pull approach consisted in interviews with experts in various domains of activities at Fortum:

◦ Hydropower (asset management, plant optimisation, scheduling)

◦ Data science (worked on various projects)

◦ Project management

◦ Digitalisation

◦ Energy trading

◦ Wind power

◦ New ventures

◦ Charge and Drive

In this phase, the use-cases were clustered according to their domain of application and their graph-related concepts and algorithms. This provides a detailed picture of which problems have previously been solved and with which approach. In this phase of the research, the question of where graph theory can be used is considered more important than which graph theory concepts are most useful. As such, the domains of applications are evaluated in the next step, in other words, a problem-centric approach was taken. A graph-centric evaluation approach could have also been possible, given their transferrability to multiple domains of applications. However, we believe that a problem-centric evaluation gives more value for practitioners since there is more organisational friction in applying a graph-based solution to various sectors than applying different graph-based algorithms within the same domain of application.

Figure 3.1 summarises the idea generation phase process.

(36)

3.3. RESEARCH LAYOUT

Figure 3.1: The idea generation phase

3.3.3 The evaluation phase

In this research, the selection of idea consisted of two successive steps: firstly, an application domain was prioritised and secondly, a specific use-case within the application domain was elected.

One of the key issues of an idea management programme is the selection of ideas from a large pool offering the biggest potential for future success of the

organisation [32]. To structure this high information load, suitable selection criteria are required. However, there is no single dominant set of criteria as every organisation has its own goals, needs and culture as well as individual budgets and timetables [33]. Therefore, the evaluation criteria are generally chosen by the organisation itself [30, 31]. The total score of an idea on these criteria indicates whether it should be accepted, deferred, or rejected.

In their guidelines, Gerlach et al. have assembled the evaluation criteria most

commonly proposed in IM literature [25]. They span a wide array of dimensions,

such as technology, organisational culture, strategy, business, etc. The evaluation

tool in this research consists of four dimensions: graph applicability, technical

(37)

3.3. RESEARCH LAYOUT

feasibility , economic potential and workability, whereby each of the dimensions further has four criteria, as per the table below. The clusters were scored from 0 to 2 on each of the criteria and the dimension score is the aggregate score of its underlying criteria.

Graph applicability Technical feasibility Economic potential Workability Underlying graph structure Simplicity of model Sector size Relevance

Richness of relationships Homogeneity of tools Sector growth Data alignment Identified concepts and algorithms Computational constraint Substitute tools Human alignment Availability of supporting use-cases Risk Scalability Integrability and maintainability

Table 3.1: Evaluation dimensions and criteria.

The evaluation tool and set of criteria were chosen with the supervision of our supervisors at KTH and experts from Fortum, with the following objectives in mind:

◦ It needs to span a wide space of decision-making dimensions, otherwise the evaluation would risk being misleading.

◦ Redundancy is minimised, meaning that factors where all clusters would score equally are omitted (for example, the presence of a database, potential presence of big data, etc.)

◦ The size of each evaluation block should be balanced, in order to avoid a bias arising from a dimension including more criteria than others.

◦ A trade-off is needed between the simplicity and comprehensiveness of the evaluation, to make it both user-friendly and complete.

◦ The evaluation needs to comprise both industry-specific and

company-specific criteria, to be both reproducible and relevant for the company.

The difficulty of evaluating an idea can be seen in the lack of background information of the ideator or examiner when describing or assessing the

potentials or limitations of an idea [34]. For mastering an information extensive evaluation process, suitable evaluation criteria are crucial to ensure reliability.

The criteria represent guiding factors that can enable transparency,

comparability and repeatability. To ensure this, formal definitions of each grade in the scoring system were made and attached in the appendix.

As was proposed by Edeland et al. in their assessment of the potential

(38)

3.3. RESEARCH LAYOUT

blockchain technology in the energy sector, an alignment of each cluster with the company’s strategic objectives was made. This allowed for a more contextualised prioritisation in accordance with the corporate objectives. In fact, strategy alignment is often cited as one of the most important dimensions in IM literature.

Strategic alignment

How well does the use-case cluster align with the strategic objectives expressed by the company?

In this section, all four main strategy aspects contain the same score definition, as follows:

◦ 0: No impact on strategy point.

◦ 1: Indirect impact in the time being. Potential impact in the long run subject to changes endogenous or exogenous to the model.

◦ 2: Direct and immediate impact.

The four strategy points in Fortum’s CEO’s Business Review are outlined below [40]:

◦ Pursue operational excellence and increased flexibility

Fortum attaches high priority to extract value from its current business portfolio. Increased flexibility refers to the flexible generation assets and demand response of large customers in order to balance the volatility caused by renewable generation. Operational excellence refers to

minimising operational costs, either through productivity increase or asset management distinction.

◦ Ensure value creation from investments and portfolio optimisation

Fortum aims to consolidate its sizeable investments from the recent years to further improve its financial performance. This includes the investment in Uniper, a German generator with a portfolio mainly comprising flexible power plants (gas, coal, hydro, oil, nuclear). In addition, a continuous review of Fortum’s portfolio of power plants is made, with an emphasis on CO

2

-free assets, flexibility and low operating costs.

◦ Drive focused growth in the power value chain

Fortum aims to grow its CO

2

-free power generation portfolio in an

(39)

3.3. RESEARCH LAYOUT

asset-light manner, for example through partnerships or co-ownership. This medium-term strategy aspires to extract value from its long-standing expertise, which it could capitalise upon through an increasingly

service-oriented value capture. Digitalisation is an enabler for delivering such services to both households, the grid operator, industrial customers or cities.

◦ Build options for significant new businesses

The uncertainty of the energy sector in the longer run will create new business opportunities which Fortum wants to seize, aiming at accrued independence of power prices and high profit contributions. This strategy point includes areas such as circular economy, waste and recycling and bio-economy. It also includes collaborations with startups and new ventures.

The mapping of the operational score against the strategic score was used to divide the use-case clusters into three groups, as is suggested by Gerlach and Brem in their conceptual model [25]:

◦ Accepted use-case clusters.

◦ Deferred use-case clusters.

◦ Rejected use-case clusters.

The second phase of the selection process is concerned with selecting a specific use-case. Three use-cases are proposed, which were deemed to best represent the accepted use-case cluster and be in line with Fortum’s operations. The use-cases were scored according to the same evaluation criteria, though not the strategy alignment, as they all belong to the same cluster and thus have the same strategic score. When the selection of use-case was finally made, the modelling and implementation phase was started.

Figure 3.2 summarises the idea selection phase.

3.3.4 The implementation phase

The actual implementation of the idea is important to demonstrate the feasibility

of an idea management programme [35]. Thus, it is a crucial motivational factor

for ideators as well. To successfully carry out the implementation process, clear

(40)

3.3. RESEARCH LAYOUT

Figure 3.2: The idea selection phase

responsibilities and teamwork are required [36]. Such an implementation team can consist of project managers, developers and subcontractors [37].

A full-scale implementation is outside the scope of this research; however, a feasibility study was performed with a more detailed conceptual model and a proof-of-concept. It consisted of the following steps:

◦ A more detailed background on the selected use-case and of the graph-technology chosen is necessary to know how to find the best fit possible.

◦ Data gathering and pre-processing. A data set was provided by Fortum for analysis.

◦ Implementation of algorithms on benchmark, open source data sets as well as on the actual studied data set.

◦ Implementation of evaluation criteria for the performance of the algorithms

(41)

3.4. LITERATURE REVIEW

chosen.

◦ Model improvements and extensions and key implications for practitioners were proposed.

3.4 Literature review

Literature reviews were conducted to gain a deeper understanding of the various dimensions of this study:

◦ The mathematical fundamentals of graph theory: the concepts, common graph-based problems and applications both in academia and industry.

◦ The techno-economic challenges opportunities of the sectors studied.

◦ The research advancements within the selected use-case cluster.

In the first phase of the research, the literature covered came both from academia (research articles and books) and “grey literature” from the industry (white papers, reports, blog-posts) as both sources would complement each other.

The “grey” literature gives business insights and inspiration in potential areas of applications more or less connected with a utility company’s operations and a provides a certain degree of confidence of the commercialisation possibility and benefits. However, in contrast with academic papers, they generally lack in technical details, giving little information on how graphs were employed.

Another limitation of grey literature was the scarce information from commercial implementation of graphs within the energy sector and the potential lack of objectivity and verifiability of their solutions and their benefits.

On the other hand, academic research in energy-related applications is vast and rich in technical details, giving the researcher a clear understanding of the implementation process, as well as a more objective evaluation of the

graph-based solution. The drawback of these resources is the little knowledge that can be derived regarding the commercialisation possibilities of their solutions, hence a higher implementational risk.

The “grey literature” was found on graph-database providers and graph analytics

companies (AWS, Neo4j, Google, Expero). The academic literature search was

made in a systematic manner, by performing a grid search on Scopus combining

all graph-related concepts with all potential sectors of application.

(42)

3.5. DATA COLLECTION

In the second phase of the research, another literature review was conducted.

The scope of sources was narrowed to academic papers related to the selected use-case to gain insight in the challenges and opportunities of possible methods and find an interesting angle to tackle the problem.

3.5 Data collection

As suggested by Saunders et al., data for exploratory research should in practice be collected both from primary and secondary sources [17]. Primary data were collected by exploratory discussions, semi-structured interviews and group

discussions. Both qualitative and quantitative secondary data was collected. The qualitative data came both from external sources (papers and reports,

presentations, videos) and internal sources (internal documents and presentations). The quantitative data was used for implementing the proof-of-concept of the selected use-case.

3.6 Semi-structured interviews

Semi-structured interviews are well suited for an exploratory research. Although they are non-standardised and adapted to the background of the participant, a common list of themes and questions to be covered are predetermined by the researcher. The objective is to gather the opinion of respondents regarding complex and sometimes even sensitive issues.

The semi-structured interviews were made with experts from the fields covered by the research. The overall objective was to understand the particular

challenges and methods used in their activities, as well as discuss the potential

benefits that graph theory could generate in this field. They were however

adapted to the particular domain of expertise of the expert (data scientist, asset

manager, innovation manager, etc.), hence some specific questions could be

omitted and flexibility was allowed to enable for exploratory discussions.

(43)

Part I

Graph theory and applications

(44)

CHAPTER 4 Elementary terminology

This chapter conveys a number of selected concepts in graph theory which are typically presented in literature about graph theory on an introductory level.

The definitions and properties are intentionally written in running texts to create coherent paragraphs and a relaxing reading. If no specific reference is mentioned, the mathematical definition, theorem, property, example or statement in

question is common and can be found in such textbooks as [41], [42] and [43].

4.1 Graph and subgraph

A graph in graph theory is simply put a collection of vertices (synonym: nodes) and edges, where an edge can be drawn between two vertices to show that these vertices are somehow related to each other. In engineering problem solving, vertices can represent elements in a system, such as people, animals, buildings, electric components and financial assets.

If a graph consisting of ten vertices which represent ten people in a

neighbourhood, one can choose to draw an edge between two vertices if the corresponding people are neighbours, or if they own the same number of pets.

There is a lot of freedom for one to define what vertices and edges represent, depending on what is interesting to study. In the next chapter, some real-life applications of graphs will be presented.

Mathematically speaking, a graph G is an ordered pair (V, E), where V is a

Graph theory applications in the energy sector: From the perspective of electric utility companies

Graph theory applications in the energy sector

From the perspective of electric utility companies KRISTOFER ESPINOSA

TAM VU

Abstract

Keywords: graph theory, feature selection, energy industry

Sammanfattning

Grafteori är ett matematiskt område där objekt och deras parvisa relationer, även kända som noder respektive kanter, studeras. Grafteorins födsel anses ofta ha ägt rum år 1736 när Leonhard Euler försökte lösa ett problem som

involverade sju broar i Königsberg i Preussen. På senare tid har grafer fått uppmärksamhet från företag inom flera branscher på grund av dess kraft att modellera och analysera stora nätverk.

elprisprognostisering. Variabelselektering är en process för att minska antalet ingångsvariabler, vilket vanligtvis genomförs innan en regressions- modell skapas för att undvika överanpassning och öka modellens tydbarhet.

Fem grafbaserade metoder för variabelselektering med olika ståndpunkter studeras. Experiment genomförs på realistiska datamängder med många

ingångsvariabler för att verifiera metodernas giltighet. En av datamängderna ägs av Fortum och används för att prognostisera elpriset, bland andra viktiga

kvantiteter. De erhållna resultaten ser lovande ut enligt flera utvärderingsmått och kan användas av Fortum som ett stödverktyg för att utveckla

prediktionsmodeller. I allmänhet kan ett energiföretag sannolikt dra fördel av grafteori på många sätt och skapa värde i sin affär med hjälp av berikad matematisk kunskap.

Nyckelord: grafteori, variabelselektering, energiindustri

Declaration

This study is conducted by students Kristofer Espinosa and Tam Vu, and

commissioned by Fortum Sverige AB.

Acknowledgements

Xiaoming Hu.

Contents

1 Introduction 1

1.1 Purpose and research question . . . . 2

1.2 Research contribution . . . . 3

1.3 Limitations . . . . 3

1.4 Delimitations . . . . 3

1.5 Disposition . . . . 3

2 Market trends for utility companies 5 2.1 Uncertain growth in electricity demand . . . . 5

2.2 A more complex portfolio . . . . 6

2.3 Evolving technology . . . . 7

2.4 Evolving conditions in the energy markets . . . . 7

2.5 Customer trends . . . . 8

2.6 Digitalisation of utility companies . . . . 9

2.7 The emergence of graph analytics . . . 10

2.7.1 Relational databases . . . 11

2.7.2 Graph databases . . . 13

2.7.3 The relevance of graph analytics for utility companies . . . . 14

3 Methodology 15 3.1 Research design . . . 15

3.2 Research approach . . . 16

3.3 Research layout . . . 16

3.3.1 The preparation phase . . . 17

3.3.2 The idea generation and suggestion phases . . . 17

3.3.3 The evaluation phase . . . 21

3.3.4 The implementation phase . . . 24

3.4 Literature review . . . 26

CONTENTS

3.5 Data collection . . . 27

3.6 Semi-structured interviews . . . 27

I Graph theory and applications 28 4 Elementary terminology 29 4.1 Graph and subgraph . . . 29

4.2 Graph traversal . . . 31

4.3 Trees and connectivity . . . 32

4.4 Matching . . . 34

4.5 Colouring . . . 34

4.6 Directed graph . . . 35

4.7 Weighted graph . . . 36

5 Selected applications of graphs 38 5.1 University timetabling . . . 38

5.2 Staff assignment . . . 39

5.3 Cost-effective railway building . . . 40

5.4 Logistic network and optimal routing . . . 41

5.5 Planar embedding of graphs . . . 42

5.6 Winter road maintenance . . . 43

5.7 Social network . . . 44

5.8 Tournament ranking system . . . 46

5.9 Image segmentation . . . 47

Summary 49 II Selection of use-case 50 6 Idea generation 51 6.1 Inventory . . . 51

6.1.1 Clustering . . . 52

6.1.2 Heat map . . . 52

7 Assessment of use-case clusters 54 7.1 Preliminaries . . . 54

7.1.1 Presentation of the use-case clusters . . . 54

CONTENTS

7.1.2 Scoring system . . . 55

7.1.3 Assessment results per criterion . . . 55

7.2 Optimisation of hydropower operations . . . 57

7.3 Hydropower operation and maintenance . . . 60

7.4 Operation and maintenance for nuclear power . . . 63

7.5 Design and maintenance for wind power . . . 66

7.6 Electric vehicle applications . . . 69

7.7 Market intelligence for energy trading . . . 72

7.8 Storage solutions for the distribution grid . . . 75

7.9 Master data management . . . 78