INOM
EXAMENSARBETE MASKINTEKNIK, AVANCERAD NIVÅ, 30 HP
STOCKHOLM SVERIGE 2020,
Graph theory applications in the energy sector
From the perspective of electric utility companies KRISTOFER ESPINOSA
TAM VU
Abstract
Graph theory is a mathematical study of objects and their pairwise relations, also known as nodes and edges. The birth of graph theory is often considered to take place in 1736 when Leonhard Euler tried to solve a problem involving seven bridges of Königsberg in Prussia. In more recent times, graphs has caught the attention of companies from many industries due to its power of modelling and analysing large networks.
This thesis investigates the usage of graph theory in the energy sector for a utility company, in particular Fortum whose activities consist of, but not limited to, production and distribution of electricity and heat. The output of the thesis is a wide overview of graph-theoretic concepts and their applications, as well as an evaluation of energy-related use-cases where some concepts are put into deeper analysis. The chosen use-case within the scope of this thesis is feature selection for electricity price forecasting. Feature selection is a process for reducing the number of features, also known as input variables, typically before a regression model is built to avoid overfitting and to increase model interpretability.
Five graph-based feature selection methods with different points of view are studied. Experiments are conducted on realistic data sets with many features to verify the legitimacy of the methods. One of the data sets is owned by Fortum and used for forecasting the electricity price, among other important quantities.
The obtained results look promising according to several evaluation metrics and can be used by Fortum as a support tool to develop prediction models. In general, a utility company can likely take advantage graph theory in many ways and add value to their business with enriched mathematical knowledge.
Keywords: graph theory, feature selection, energy industry
Sammanfattning
Grafteori är ett matematiskt område där objekt och deras parvisa relationer, även kända som noder respektive kanter, studeras. Grafteorins födsel anses ofta ha ägt rum år 1736 när Leonhard Euler försökte lösa ett problem som
involverade sju broar i Königsberg i Preussen. På senare tid har grafer fått uppmärksamhet från företag inom flera branscher på grund av dess kraft att modellera och analysera stora nätverk.
Detta arbete undersöker användningen av grafteori inom energisektorn för ett allmännyttigt företag, närmare bestämt Fortum, vars verksamhet består av, men inte är begränsad till, produktion och distribution av el och värme. Arbetet resulterar i en bred genomgång av grafteoretiska begrepp och deras tillämpningar inom både allmänna tekniska sammanhang och i synnerhet energisektorn, samt ett fallstudium där några begrepp sätts in i en djupare analys. Den valda fallstudien inom ramen för arbetet är variabelselektering för
elprisprognostisering. Variabelselektering är en process för att minska antalet ingångsvariabler, vilket vanligtvis genomförs innan en regressions- modell skapas för att undvika överanpassning och öka modellens tydbarhet.
Fem grafbaserade metoder för variabelselektering med olika ståndpunkter studeras. Experiment genomförs på realistiska datamängder med många
ingångsvariabler för att verifiera metodernas giltighet. En av datamängderna ägs av Fortum och används för att prognostisera elpriset, bland andra viktiga
kvantiteter. De erhållna resultaten ser lovande ut enligt flera utvärderingsmått och kan användas av Fortum som ett stödverktyg för att utveckla
prediktionsmodeller. I allmänhet kan ett energiföretag sannolikt dra fördel av grafteori på många sätt och skapa värde i sin affär med hjälp av berikad matematisk kunskap.
Nyckelord: grafteori, variabelselektering, energiindustri
Declaration
This study is conducted by students Kristofer Espinosa and Tam Vu, and
commissioned by Fortum Sverige AB.
Acknowledgements
We want to thank our supervisors from Fortum, Alexandra Bådenlid and Linda Marklund Ramstedt, as well as our manager and mentor, Hans Bjerhag, for their constant support and engagement. As well, we are thankful for the help we have received from our respective supervisors at KTH, Elena Malakhatka and
Xiaoming Hu.
Contents
1 Introduction 1
1.1 Purpose and research question . . . . 2
1.2 Research contribution . . . . 3
1.3 Limitations . . . . 3
1.4 Delimitations . . . . 3
1.5 Disposition . . . . 3
2 Market trends for utility companies 5 2.1 Uncertain growth in electricity demand . . . . 5
2.2 A more complex portfolio . . . . 6
2.3 Evolving technology . . . . 7
2.4 Evolving conditions in the energy markets . . . . 7
2.5 Customer trends . . . . 8
2.6 Digitalisation of utility companies . . . . 9
2.7 The emergence of graph analytics . . . 10
2.7.1 Relational databases . . . 11
2.7.2 Graph databases . . . 13
2.7.3 The relevance of graph analytics for utility companies . . . . 14
3 Methodology 15 3.1 Research design . . . 15
3.2 Research approach . . . 16
3.3 Research layout . . . 16
3.3.1 The preparation phase . . . 17
3.3.2 The idea generation and suggestion phases . . . 17
3.3.3 The evaluation phase . . . 21
3.3.4 The implementation phase . . . 24
3.4 Literature review . . . 26
CONTENTS
3.5 Data collection . . . 27
3.6 Semi-structured interviews . . . 27
I Graph theory and applications 28 4 Elementary terminology 29 4.1 Graph and subgraph . . . 29
4.2 Graph traversal . . . 31
4.3 Trees and connectivity . . . 32
4.4 Matching . . . 34
4.5 Colouring . . . 34
4.6 Directed graph . . . 35
4.7 Weighted graph . . . 36
5 Selected applications of graphs 38 5.1 University timetabling . . . 38
5.2 Staff assignment . . . 39
5.3 Cost-effective railway building . . . 40
5.4 Logistic network and optimal routing . . . 41
5.5 Planar embedding of graphs . . . 42
5.6 Winter road maintenance . . . 43
5.7 Social network . . . 44
5.8 Tournament ranking system . . . 46
5.9 Image segmentation . . . 47
Summary 49 II Selection of use-case 50 6 Idea generation 51 6.1 Inventory . . . 51
6.1.1 Clustering . . . 52
6.1.2 Heat map . . . 52
7 Assessment of use-case clusters 54 7.1 Preliminaries . . . 54
7.1.1 Presentation of the use-case clusters . . . 54
CONTENTS
7.1.2 Scoring system . . . 55
7.1.3 Assessment results per criterion . . . 55
7.2 Optimisation of hydropower operations . . . 57
7.3 Hydropower operation and maintenance . . . 60
7.4 Operation and maintenance for nuclear power . . . 63
7.5 Design and maintenance for wind power . . . 66
7.6 Electric vehicle applications . . . 69
7.7 Market intelligence for energy trading . . . 72
7.8 Storage solutions for the distribution grid . . . 75
7.9 Master data management . . . 78
7.10 Knowledge graphs . . . 81
7.11 Results of use-case cluster assessment . . . 83
7.11.1 Assessment results . . . 83
7.11.2 A note on Energy trading . . . 85
7.12 Discussion on the selection of use-case cluster . . . 87
8 Assessment of use-cases in Energy trading 88 8.1 Natural gas market analysis with visibility graphs . . . 90
8.2 Smart meter clustering for short-term load forecasting . . . 92
8.3 Feature selection for electricity price forecasting . . . 95
8.4 Results of the use-case assessment . . . 98
8.4.1 A note on Electricity price forecasting . . . 98
8.5 Discussion on the selection of use-case . . . 100
9 Conclusion and discussion on PART II 101 9.1 Conclusion . . . 101
9.2 Discussion . . . 102
III Graphs, feature selection and electricity price forecasting 104 10 Graph theory in electricity price forecasting 105 10.1 Background on the electricity market . . . 105
10.1.1 Roles in the market . . . 105
10.1.2 The day-ahead spot-market . . . 106
10.1.3 The balancing market . . . 107
10.2 Electricity price forecasting . . . 108
CONTENTS
10.2.1 Forecasting methods . . . 108
10.2.2 Feature selection in electricity price forecasting . . . 111
10.2.3 The case of intra-day . . . 113
10.2.4 Summary . . . 114
10.3 Motivations on the proposed methodology . . . 115
11 Introduction to feature selection 116 11.1 Background . . . 116
11.2 Purpose and outline . . . 117
12 Preliminaries 118 12.1 Laplacian matrix . . . 118
12.2 Nearest neighbour graph . . . 119
12.3 Graph clustering . . . 120
12.4 Comparison of two clusterings . . . 120
12.4.1 Clustering accuracy . . . 121
12.4.2 Normalised mutual information . . . 121
12.4.3 Adjusted mutual information . . . 122
12.5 Similarity comparison of two sets . . . 122
13 Feature selection methods 123 13.1 Laplacian score (LS) . . . 125
13.1.1 Preparation . . . 125
13.1.2 Optimisation problem . . . 125
13.1.3 Feature selection algorithm . . . 127
13.2 Multi-cluster feature selection (MCFS) . . . 127
13.2.1 Preparation . . . 128
13.2.2 Optimisation problem . . . 128
13.2.3 Feature selection algorithm . . . 129
13.3 Non-negative discriminative feature selection (NDFS) . . . 129
13.3.1 Preparation . . . 130
13.3.2 Optimisation problem . . . 130
13.3.3 Feature selection algorithm . . . 133
13.4 Feature selection via non-negative spectral analysis and redundancy control (NSCR) . . . 134
13.4.1 Optimisation problem . . . 134
13.4.2 Feature selection algorithm . . . 136
CONTENTS
13.5 Feature selection via adaptive similarity learning and subspace
clustering (SCFS) . . . 137
13.5.1 Optimisation problem . . . 137
13.5.2 Feature selection algorithm . . . 137
14 Experiments and results 139 14.1 Line of action . . . 139
14.2 Data sets . . . 140
14.3 Parameter setting . . . 140
14.4 Experiment: Eight public data sets . . . 141
14.4.1 Clustering accuracy . . . 141
14.4.2 Normalised and adjusted mutual information . . . 141
14.4.3 Stability . . . 142
14.5 Experiment: CELEBI of Fortum . . . 142
15 Discussion 144 15.1 Convergence speed . . . 144
15.2 Parameter sensitivity . . . 145
15.3 Jaccard index . . . 145
15.4 Conceivable challenges . . . 146
15.5 Implications for forecasting activities . . . 146
15.6 Scalability . . . 147
16 Research conclusion 148
Bibliography 150
Appendices 176
A Part I & II: Figures and tables
A.1 List of interviews . . . .
A.2 Inventory of use-cases . . . .
A.3 Evaluation tool . . . .
A.3.1 Graph applicability . . . .
A.3.2 Technical feasibility . . . .
A.3.3 Economic potential . . . .
A.3.4 Workability . . . .
B Part III: Figures and tables
CONTENTS
B.1 Experiments and results . . . .
B.2 Discussion . . . .
C Glossary
List of Figures
2.1 European Net electricity generation, EU-28, 1990-2017. Source:
Eurostat 2019 . . . . 6
2.2 Evolution of the estimated impact of technology on utility companies between 2015 and 2018. Source: Monitor Deloitte (2018) 9 2.3 The economic impacts of digitalisation on utility earnings Source: McKinsey (2016) [4] . . . 10
2.4 A friendship directed social graph. Source: AWS (2020) . . . 14
3.1 The idea generation phase . . . 21
3.2 The idea selection phase . . . 25
4.1 Examples of graphs. . . 30
4.2 Various kinds of walks in a graph. . . 32
4.3 Examples of graphs which are not trees. . . 33
4.4 Differently configured trees with 5 vertices and 4 edges each. . . 33
4.5 Examples of spanning trees in a connected graph. . . 33
4.6 Examples of matchings. . . 35
4.7 Examples of vertex colourings using as few colours as possible. . . . 36
4.8 Weighted directed graph and a small fictitious kingdom far, far away. 37 5.1 The complete bipartite graph K
3,3is not planar. . . 42
5.2 Graph vertices can be most central in different aspects. . . 45
6.1 Heat map of past use-cases per cluster and per graph-related concept. 53 7.1 Hydropower optimisation . . . 59
7.2 Assessment of Hydropower: Operation and maintenance . . . 63
7.3 Assessment of Nuclear power operation and maintenance . . . 65
7.4 Assessment of Wind power operation and maintenance . . . 68
LIST OF FIGURES
7.5 Assessment of EV Applications . . . 71
7.6 Assessment of Energy trading . . . 75
7.7 Assessment of Energy storage solutions . . . 78
7.8 Assessment of Master Data Management . . . 81
7.9 Assessment of Knowledge graphs . . . 83
7.10 Graph-score vs strategy-score of each use-case cluster . . . 84
8.1 Assessment of Natural gas market analysis with visibility graphs . . 92
8.2 Assessment of Short-term load forecasting - clustering . . . 95
8.3 Assessment Electricity price forecasting - feature selection . . . 98
8.4 Comparison of the use-case assessments . . . 99
10.1 Formation of electricity prices . . . 107
14.1 Average Jaccard index. . . 143 B.1 Jaccard index for eight different data sets. . . . B.2 Relative change of the objective value in iterative methods. . . . B.3 Clustering quality for different α and β, obtained with the features
selected using SCFS. . . .
List of Tables
3.1 Evaluation dimensions and criteria. . . 22
5.1 The centrality measures of the vertices in Figure 5.2. . . 46
6.1 Number of past use-cases per cluster . . . 52
7.1 Scoring of use-case clusters . . . 56
8.1 Scoring of use-cases in Energy trading . . . 89
13.1 Notations associated with a given data set. . . 124
14.1 Data sets with their numbers of samples (m) and features (n). . . . 140 A.1 List of interviewees. . . . A.2 Use-cases in Hydropower optimisation . . . . A.3 Use-cases in Hydropower operation and maintenance . . . . A.4 Use-cases in Nuclear power operation and maintenance . . . . A.5 Use-cases in Wind power design, operation and maintenance. . . . . A.6 Use-cases for Electric vehicle applications . . . . A.7 Use-cases in Energy trading. . . . A.8 Use-case in Energy storage . . . . B.1 Clustering accuracy (ACC) [%] corresponding to different data sets
and feature selection methods. . . . B.2 Normalised mutual information (NMI) [%] corresponding to
different data sets and feature selection methods. . . . B.3 Adjusted mutual information (AMI) [%] corresponding to different
data sets and feature selection methods. . . .
LIST OF TABLES
B.4 Clustering quality measures for feature selection of CELEBI. The
first column shows the indices of the subsets. . . .
CHAPTER 1
Introduction
Graph theory is a mathematical study of objects and their pairwise relations, even known as nodes and edges respectively. The birth of graph theory is often considered to take place in 1736 when the Swiss mathematician Leonhard Euler tried to solve a routing problem about seven bridges of Königsberg in Prussia. In more recent times, the increase in data and computing power has given rise to computational intelligence modelling and perhaps more discretely has graph theory been applied to several services at the foundation of a digital society.
Google’s PageRank search algorithm and their map are based on graphs and so are Facebook’s and Twitter’s social networks. Whether it is a link from a website pointing to another or adding someone as a friend, the relations (edges) bear the fundamental information. These applications of graph theory in the digital sphere, generally with large amounts of data, are referred to as graph analytics.
Graph analytics have the advantage of being fast analysis tools and scalable to exceptionally large networks. Thus, it has caught the attention of companies from all types of industries and is commonly being used as a modelling tool designed for networks.
Societies are currently undergoing a transition toward a low-carbon energy system. The main drivers of this change, electric utility companies, are
digitalising and innovating on new products to stay or become more competitive.
The liberalisation of the power markets and the increasing level of renewable
sources of electricity fed into the grid have brought perspectives of diminishing
electricity prices in many countries, complicating the situation for the electricity
1.1. PURPOSE AND RESEARCH QUESTION
generators. On the other hand, the technological advancements with respect to the energy and information technology sectors provide new opportunities for utilities to optimise their operations or find new revenues streams. In this regard, graph theory appears as a potentially helpful technology to support such
endeavours. It has been used in the contexts of Internet of Things (IoT), for routing, fraud detection, customer analysis, advanced search, scheduling and many more. Less publicised are the set of useful applications specific to the power market. As such, similarly as to machine learning or blockchain, utilities have an interest in getting deeper insights in the technology to assess where and how it can be used to their benefit.
1.1 Purpose and research question
This paper aims to give the interested reader guidelines as to where graph theory is applicable in the energy sector and more particularly in the power market. A thorough background of graph theory and its theoretical applications is followed by a comprehensive presentation of previous case studies revolving around the operational markets of a typical utility company, giving an indication of potential applications. These applications, also called use-cases, are evaluated to indicate where and how they could be beneficial. Finally, a use-case is conceptualised and a proof-of-concept realised.
More specifically, we will treat the following questions:
◦ What are the main benefits of graph theory in engineering and science applications?
◦ How can graph theory be useful in the energy sector and more particularly within the context of an electric utility company?
◦ Given a use-case relevant for the business of Fortum, what can a graph-based model look like and what algorithms can be used to solve problems?
Relevant theoretical backgrounds will be covered for a deeper understanding of
the topics in question, namely graph theoretical and energy-specific ones.
1.2. RESEARCH CONTRIBUTION
1.2 Research contribution
This research aims to bring contributions in the following ways:
◦ An overview of applications of graph theory within the energy sector with a detailed theoretical background for pedagogical purposes, which to our knowledge does not exist to date.
◦ A generic evaluation tool for a face-value assessment of graph-theory potential for commercial applications for practitioners in any industry.
◦ A review of graph-based feature selection methods and their application to a commercial use-case.
1.3 Limitations
In this study, a trade-off is necessary between the number of applications
evaluated and the amount of details provided in each. Each application covered is in fact subject for a study in itself. Consequently, the assessments are to be considered at face-value, as a guideline for practitioners to where attention is to be directed.
1.4 Delimitations
This study was commissioned by Fortum. Hence, the verticals and geographies of the energy sector considered for the applications are aligned with the operations of Fortum, with the exception of some of the more corporate activities, including strategic and financial decision-making as well as regulatory and compliance activities. The reason for this is to contain the research mainly to energy industry-specific challenges.
1.5 Disposition
This paper comprises three parts.
Part I contains a wide overview with elementary concepts of graph theory and
selected applications with support of recent researches. This part aims to provide
1.5. DISPOSITION
an inspiring mathematical background for practitioners needing mathematical foundations for implementing graph analytics applications.
Part II details the assessment of the identified use-cases of graph theory in the energy sector. The evaluation tool is explained in further detail and a
comparison between use-cases is made, in order to find a relevant application for a proof-of-concept as well as guiding decision-making for which areas to focus on for future proofs-of-concept.
Part III focuses on a specific area relevant for Fortum to bring the general concepts into deeper analyses. Within the scope of this thesis, the case study is about feature selection for electricity price forecasting using graph-based
methods. Five methods with different points of view are presented with main
idea, derivation, algorithm and quality validation by experiments on real-world
data.
CHAPTER 2
Market trends for utility companies
Utility companies have a central role in facilitating the coming transition to a sustainable energy system. Active across the value chain from power generation to its delivery to the end-user, they face increasing political, economic and social pressure to decarbonise the power sector and scale the integration of renewable sources of energy. However, given the scale of the economic, technological and organisational challenges to this, some argue that this evolution can in fact be likened more to a transformation. Below are some fundamental trends affecting the power sector that utilities need to account for when elaborating their strategies.
2.1 Uncertain growth in electricity demand
Historically, utility companies have been able to invest in new power generating assets thanks to an ever-growing demand in electricity. However, utilities in most developed economies have been seeing stagnating or even declining demands in electricity, mainly due to the fact that the electrification rate already is so high.
Other reasons for declining electricity demand are the deindustrialisation of
developed economies and the energy efficiency improvements, notably in the
residential sector. For instance, in New York, the electricity demand is projected
to grow at an average annual rate of 0.16 % through 2024. For the case of
2.2. A MORE COMPLEX PORTFOLIO
Europe, the figure below summarises the evolution of electricity consumption since 1990 [1].
Figure 2.1: European Net electricity generation, EU-28, 1990-2017.
Source: Eurostat 2019
Despite this recent trend, the electrification of the residential and transportation sector is adding uncertainty to the future electricity demand projections,
particularly in terms of peak demand, potentially driving up the need for the total power system capacity, yet without necessarily increasing the demand [13].
2.2 A more complex portfolio
Renewable energy generation differs from conventional power sources in some fundamental ways, which alter the companies’ operations all along the asset life cycle, from the investment and the operation and maintenance to the
decommissioning. As solar and wind are the growth drivers among the distributed energy resources (DERs), focus will be put on them.
Different asset types
As opposed to conventional power sources, DERs have on average significantly lower capacities and are generally connected to the distribution grid (low to medium voltage) and are sparsely located across geographical regions. As such, their design, placement and sizing are not only subject to the electricity grid and pricing signals, but also vary largely based on topology, irradiation, wind maps and land ownership. To be competitive, utilities now need to reach the same level of expertise on these matters as they have accumulated over time regarding conventional power sources [6]
Evolving needs in operation and maintenance
2.3. EVOLVING TECHNOLOGY
The intermittent nature of certain DERs such as wind and solar causes their operation to be different than conventional power sources. The uncertainty of the weather conditions adds complexity to the day-ahead bidding strategy of
utilities, notably in terms of residual demand of the energy system. Not only is their own power output more unpredictable, the output of competitors is too, causing higher margins of error induced by inaccurate weather modelling [14].
As wind power plants and solar farms are more distributed, complexity is also added on the maintenance-side. The portfolio of the utilities now encompasses higher distributions in terms of asset location, ages and life cycles. Even though OEMs oftentimes provide warranties for the wind and solar power plants, utilities are incentivised to create their own intelligence on these matters to attain a competitive edge.
2.3 Evolving technology
Whereas conventional sources such as hydropower and nuclear are more mature technologies, the technology in wind and solar, as well as energy storage, is rapidly evolving. This complicates the investment decision, not only in terms of the right timing to commission a new renewable power plant, but also in their design (selection of equipment, addition of energy storage).
2.4 Evolving conditions in the energy markets
Statkraft projects solar power to become the largest source of power generation on a global basis and cover almost 30 % of all electricity generation in 2040, with wind power covering 20 % [7]. This has two main effects on the economics of the market: declining electricity prices during peak production hours and increasing need for grid stability services [2].
As renewables of the same type generate power simultaneously within the same geographical area, a higher proportion of renewables is fed into the system, an increase in supply of free generation drives down prices, sometimes even leading to negative prices. This can have a notable impact on the investment decisions in renewable generation, as adding renewable capacity hampers the profitability from the installed renewable generation [6].
The stability of the grid is impacted by the increasing share of intermittent
2.5. CUSTOMER TRENDS
power generation, combined with the reduction in the system inertia provided by heavier rotating equipment. Thus, increasing attention is channelled towards providing ancillary services for the grid operator on the intra-day market as new revenue sources for asset utilities [2].
On top of this, the energy commodities market, particularly coal, oil and gas, is becoming all the more volatile and financialised, subject to recurring global political and economic crises. In particular, natural gas is increasingly decoupled from oil prices and subject to more fundamental forces. This puts pressure on higher forecasting needs, whether or not a utility uses coal or gas as fuel, impacting both their input price for electricity generation and their electricity price projections [8].
2.5 Customer trends
There is a growing engagement from the customers to participate in the power sector in various ways. First of all, the lower barriers for investment in solar power has attracted many private households across the globe to invest in their own power production and net metering schemes have been vastly deployed [12].
Secondly, there is a growing attention put on demand response programmes, allowing customers to adapt their consumption in response to electricity price signals. This is to a large extent enabled by the roll-out of smart metering
systems, allowing for a timely, two-way communication between the end-user and the central system [11].
Utilities are thus facing a growing interest from customers for higher energy independence and more elaborate and efficient energy solutions. This forces them to innovate in terms of product offering in order to reduce churn or grid defection (i.e. “going off the grid”). There is a risk for the situation to evolve into what is being called the “death spiral”, whereby customers going off the grid further increases grid defection, because the network costs are borne by fewer customers.
However, this trend is to be taken with a grain of salt, as it has not yet been observed on a large scale and could be contained to some specific geographical regions where the conditions are met (e.g. high solar irradiation) [5].
Despite all this, electricity is still seen as commodity for end-users and price
elasticity is low [9]. The interest and understanding of the energy system from
the customer’s side remains low for most people, which makes the
2.6. DIGITALISATION OF UTILITY COMPANIES
competitiveness of a utility dependent on product pricing rather than on product differentiation, hampering the efforts of utilities to engage them and innovate rapidly [6].
The aforementioned conditions puts pressure on utilities to adapt their operations and competences, reimagine their role in the system through technological and business model innovations, while needing to optimise their asset fleet in various ways. Being industry incumbents, utilities have a strong inertia, this change is both a technical, economic and organisational challenge, requiring timely decisions and a balanced explore-exploit trade-off [10].
As can be seen in the figure below, the timing and impact of underlying energy industry trends for utilities is difficult to assess [3].
Figure 2.2: Evolution of the estimated impact of technology on utility companies between 2015 and 2018.
Source: Monitor Deloitte (2018)
2.6 Digitalisation of utility companies
New digital tools can prove useful in supporting utility companies in tackling
some of the challenges mentioned above. Managing and utilising the data
generated from the different components of the electrical system represent great
2.7. THE EMERGENCE OF GRAPH ANALYTICS
opportunities for utility companies to respond to the evolving environment they operate in. As seen in the figure below, digitalisation can have a positive effect all along the electricity value chain [4].
Figure 2.3: The economic impacts of digitalisation on utility earnings Source: McKinsey (2016) [4]
Utility companies have already started to invest heavily in digitilisation,
particularly enabled by the emergence of Internet of Things and the possibility to process big data with increased computing power and Artificial Intelligence (AI).
Since 2014, the investments in digital electricity infrastructure and software have seen an annual growth rate at above 20 %, reaching US$ 47 billion in 2016. By increasing connectivity as well as modelling and monitoring capabilities,
digitilisation is expected to decrease generation costs with up to 10 % to 20 % in the oil and gas industry, and 5 % in the power sector and help integrate more intermittent renewable generation. On the consumption side, digitilisation could cut energy use by about 10 % in buildings, help reshape the mobility sector, facilitate "smart demand response" programmes as well as smart charging technologies for electric vehicles [15].
2.7 The emergence of graph analytics
According to Gartner, digitalisation "is the use of digital technologies to change a
business model and provide new revenue and value-producing opportunities".
2.7. THE EMERGENCE OF GRAPH ANALYTICS
Digitilisation is enabled by digitisation, which is the mere process "changing from analog to digital form" [16]. The traditional way of digitising real-world processes is through tabular databases, where elements from the real world are stored in tables. Increasingly, however, graph databases are used to better capture the relationships between the digitised elements in a network. The latter type of database is a stepping stone to be able to garner the power from graph theory. A quick look at the different forms of databases can be helpful in order to
understand how graph theory can become a useful tool of analysis for organisations undergoing digitilisation.
2.7.1 Relational databases
Tabular, or relational, databases store information in tables. Each table represents data elements, typically called a model, where the columns list the attributes of the model and where each row is an instance of the model, identified by a unique ID.
Example: A person model
There are three types of relationships between models: 1-to-1, 1-to-many and many-to-many.
1-to-1 relationships describe a relationship between A and B in which one element of A may only be linked to one element of B and vice versa. A country and its capital city have a 1-to-1 relationship.
1-to-many relationships are a parent-child type of relationship, where A can have several instances of B but B can only have one instance of A. We say that B belongs to A. An example of this is an organisation which has several
employees. In this case, common practice is to assign an organisation foreign key to an instance in the employee table.
Many-to-many relationships are where A can be connected to several instances of B and vice versa. This can be exemplified by the relationship
between a doctor and a patient: A doctor can have several patients and a patient can have several doctors. A solution to store this information is to connect them with a common table, for example booking, the instanced of which contain both a doctor and a patient foreign key.
Query language
2.7. THE EMERGENCE OF GRAPH ANALYTICS
SQL has remained a consistently popular choice for database administrators over the years primarily due to its ease of use and the highly effective manner in which it queries, manipulates, aggregates data and performs a wide range of other functions to turn massive collections of structured data into usable information.
Benefits and drawbacks
Tabular databases have however two inherent drawbacks. The first one is that relationships can only be described by foreign keys and as such are
descriptionless. This entails that the relationship cannot be described in full.
Another drawback of tabular databases is the complexity of traversing the database. If a connection needs to be found between tables which are indirectly linked, the user first needs to join all tables based on the foreign keys.
A basic example can help illustrate this. To find students in a given geographical continent, the database manager would have to join the tables Student,
University, City, Country, Continent on their respective foreign keys. The query would look like below:
SELECT * FROM students
JOIN universities ON universities.id = students.university_id JOIN cities ON cities.id = universities.city_id
JOIN countries ON countries.id = cities.country_id
JOIN continents ON continents.id = countries.continent_id WHERE continents.name = "Europe"
This query has two implications:
◦ The complexity of the queries can quickly increase as the distance between A and B increases.
◦ The complexity of the calculation for the server can also increase
dramatically as the size of the database or the distance between A and B increases, slowing down the response time and hence the productivity of the organisation.
SQL is not the best choice for all database applications. For one thing, while
SQL had been effective at data scales up through the 1990s and beyond, it
started to falter at the hyperscale levels at the turn of the century. Some users
also complain of its sharding limitations hampering the ability to break large
databases into smaller, more manageable ones.
2.7. THE EMERGENCE OF GRAPH ANALYTICS
These drawbacks are what led to the creation of NoSQL and the more recent NewSQL, which attempt to enhance the traditional SQL’s scalability without sacrificing its inherent atomicity, consistency, isolation and durability (ACID), critical components of stable databases.
2.7.2 Graph databases
Graph databases differ from relational databases in that they give as much importance on the relationship between elements (edges) as on the elements themselves (nodes). They are particularly useful in modelling networks of highly connected components. As with relational databases, the entities can be
described with a set of attributes, most often described through a dictionary data-structure (key-value pairs), often stored as JSON objects. The edges of the graph are stored in the same way as the nodes, described by key-value pairs.
A graph database can be illustrated by the classical example of social graphs.
The nodes in this example are people, who might have a set of attributes (e.g.
first name, last name). The edges represent the friendships between people and can also contain more information (e.g. date of friendship established, strength of the bond). The ability of adding information on the relationship is a cornerstone of modern social media applications such as Facebook, Twitter, Instagram and more and is the foundation to the algorithms aimed at personalising the content for the end-users (e.g. promoted posts).
An interesting nuance between Facebook and Instagram lies in the directness or not of the relationships (followers vs. friends). This can easily be modelled by the simple choice of applying directed or undirected edges connecting nodes.
Figure 2.4 is an example of a directed social graph.
On top of the potential improvements in query response times and the higher
level of intuition caused by the choice of a graph-database for modelling more or
less complex networks, a large set of algorithms and analysis tools from the field
of graph theory are made available for analytics purposes. Determining the
density of the network, identifying influential nodes, assessing the similarity or
interdependence of nodes or visualising the propagation of some phenomena
through a network are all facilitated by graph-related concepts such as clustering,
centrality, connectivity, label propagation, link prediction, etc.
2.7. THE EMERGENCE OF GRAPH ANALYTICS
Figure 2.4: A friendship directed social graph.
Source: AWS (2020)
2.7.3 The relevance of graph analytics for utility companies
As companies digitalise their assets and operations, the question of the modelling approach arises. As was briefly discussed in the previous section, alternatives to the traditional tabular databases are available, among others graph-databases.
Given the fundamental trend of society toward higher levels of connectivity, be it for physical objects via the roll-out of sensors, for people and companies or energy commodity prices, a graphical approach can prove useful.
This very observation motivates the incentive for electric utility companies to
gain a deeper understanding of the foundations of the mathematical field of
graph theory to better harness the potential of graph analytics.
CHAPTER 3
Methodology
3.1 Research design
Management research method literature generally classifies research purpose in three types: exploratory, descriptive or explanatory. Depending on its purpose, research can belong to one or more of the research types and can even include them sequentially as the research evolves [17]. The objectives of the respective approaches are:
◦ Exploratory: the problem is loosely defined and the area potentially
unexplored and the researcher needs to gain a clearer understanding of the topic.
◦ Descriptive: an observed phenomena, person or situation needs to be analysed and described objectively, generally without inferring conclusions.
◦ Explanatory: a situation or a problem is observed and a causal relationship between the variables is necessary.
An exploratory research was deemed fit for the purpose of this study, considering
the need to deeply understand the domains of graph theory and of the energy
sector operations and the novelty of the work. This research was carried out in
accordance with exploratory research praxis, both with respect to the research
approach and the information gathering. Indeed, as Saunders et al., point out,
exploratory research needs to be dynamic and flexible to changes, as new data
3.2. RESEARCH APPROACH
and new insights can alter the direction of the study [17]. In general, the research starts with a broad transversal perspective and narrows down as insights are gained. Regarding information gathering, common approaches include literature review, interviews with subject experts and the conducting of focus groups.
3.2 Research approach
Research can be carried out either inductively, deductively or abductively [17].
◦ Inductive: the research starts with observations and theories are proposed as a result of the observations, toward the end of the research process [18].
◦ Deductive: this approach is concerned with "developing a hypothesis (or hypotheses) based on existing theory and then designing a research strategy to test the hypothesis" [19].
◦ Abductive: the research process is devoted to finding the best possible explanation to "surprising" observations from a range of possible theories, reaching a plausible yet not necessarily universally true conclusion [20].
This research adopts a hybrid inductive-deductive approach. Firstly, an inductive approach is taken as there is no hypothesis to verify. The aim is to discover the potential benefits of implementing graph-based solutions by a deeper
understanding of graphs and perceived problems from the industry. In fact, hypotheses are being created as use-cases are elaborated. The retained
hypothesis is the one from the selected use-case, namely that it can create value to practitioners. This hypothesis is verified in the subsequent part of the
research, in a qualitative and quantitative manner. Thus, this latter phase takes a deductive approach.
3.3 Research layout
Finding applications of graph theory in one’s sector is essentially an Innovation
Management endeavour. Innovation Management is a set of processes companies
implement to continuously introduce new technological solutions, products and
services to their markets. The processes typically revolves around generating new
ideas and solutions, which are then evaluated, prioritised, tested and finally
implemented and rolled out [21].
3.3. RESEARCH LAYOUT
This research focuses on the first steps of this process, that is, from the
generation of new ideas to the development of a proof-of-concept for a selected idea. Therefore, the more narrow field of research of idea management was taken inspiration from for the research layout. Idea Management (IM) is a sub-process of innovation management, aiming at structuring and streamlining the idea generation, evaluation and selection processes [32]. It is also referred to as the
“front-end” of innovation, as the ideas are generally generated by the employees themselves to address their specific problems and needs. The systematic
approach adopted in IM is being implemented by many large firms to cope with an otherwise “fuzzy” nature of the front-end of innovation management, bearing a high level of informality and uncertainty [23, 24].
The research layout is explained in more details below. It followed the main steps proposed by Gerlach and Brem in their guidelines for IM practitioners [25], namely the preparation phase, the idea generation and suggestion phases, the evaluation phase and the implementation phase.
3.3.1 The preparation phase
This phase defines the overall objective and scope of the idea management programme. An idea manager formulates the ideation strategy and plans how to generate, improve and evaluate ideas. The defined rules of the preparation phase can be seen as the first of various filters toward a potential commercialisation [27].
Among the problem types defined by Gerlach and Brem in the IM process, this research addresses “a new technology looking for a new application”, in this case, graph theory [25]. Therefore, the preparation phase entailed an extensive
background research on graph theoretical concepts and applications, as well as the design of a nuanced approach from traditional IM praxis for the idea generation and selection phase.
3.3.2 The idea generation and suggestion phases
This phase is usually of distributed nature, insofar as people from the
organisation submit and present their ideas. A critical factor to consider is the
reward system to engage employees for submission. Due to the novelty of the
technology constraint (i.e. the use of graph theory) and the lack of reward
3.3. RESEARCH LAYOUT
possibilities, this research proposes a data-driven approach for the idea generation phase. The interviews conducted helped identifying critical problems in each sector of activity of the firm, seeding the search for past use-cases of graph-based solutions and contributing to assess the relevance of these to the organisation.
A vast inventory of use-cases was compiled across the core activities at the utility company, spanning all possible types of graph-theoretic solutions, both from academia and the industry. In parallel, interviews were conducted to understand the challenges and opportunities in the various sectors Fortum operates in, providing deeper insights in the relevancy of the inventoried use-cases and their implications. The use-cases were clustered into areas of applications and filtered by relevancy to the company’s value chain as part of the “idea improvement”
process suggested by several authors in the IM literature [23, 28, 29].
The approach to generate application ideas is inspired from the concepts of
“market pull” and “technology push” [38]. In the “market pull”, the source of the innovation is an unmet need of the customer (in this case, it could be both the end-customer and the end-user of the solution). This results in new demands for problem-solving (‘invent-to-order’ a product for a certain need). The impulse comes from individuals or groups who (are willing to) articulate their subjective demands.
In the “technology push”, the stimulus for new products and processes comes from (internal or external) research; the goal is to make commercial use of new know-how. The impulse is caused by the application push of a technical
capability. Therefore, it does not matter if a certain demand already exists or not.
In this research, the pull approach consists in identifying the needs in the respective sectors, whereas the push approach considers the benefits and
drawbacks of the technology itself and the solution it has permitted in academia and various industries.
Technology push
The push approach consists in identifying classic, verified use-cases of the
technology in order to understand the inherent benefits of the technology in
practice. This inventory of use-case domains provides an important benchmark
for tying the utility company’s inventory of areas of improvements to graph
analytics. The two sources of benchmark use-cases are academia and the
3.3. RESEARCH LAYOUT
industry.
Academia
Graph theory is a mature and vastly studied mathematical area. A Scopus search for “graph theory” yields about 100,000 results. As a comparison, a
“blockchain”-search results in 15,000 hits. In addition, their high degree of adaptability across engineering tasks, thus potentially across a utility company’s operations.
Therefore, a systematic, programmatic search for research articles relating graph theory and energy activities was performed in order to guarantee a complete scanning for potential applications. The first phase consisted in searching a combination of keywords representing graph theory: “graph theory”,
“graph-based”, “network theory”, in combination with each energy sector the company is operating in (e.g. hydropower, nuclear power, wind power, etc.).
The second phase consisted in listing graph-related concepts and algorithms (about 70) and identifying their presence in academia across the energy sectors.
Due to the higher dimensionality of this search (70 × 15 entries), the search was done programmatically with the Scopus API (Application Programming
Interface), on TITLE-ABSTRACT-KEYWORDS. In total, about 3,000 articles were found, stored, curated first automatically according to various filters and then with the help of the abstracts.
Some interesting articles could be found thanks to the higher specificity of the graph-theoretic concepts used in the search, graph theory being a rather wide area in mathematics, sometimes making it too vague to be present in the abstract or in the keywords on Scopus.
Industry
As described in the background, graph-based solutions have had an increasing interest from the industry, particularly in highly digitalised companies.
Benchmark use cases from industry were mainly extracted from classical
use-cases published online, as well as publicised case studies from graph-database providers (Neo4j, Expero). The industry benchmark use-cases having a lower a priori relevance level to the specific energy industry and are less detailed in their implementation than are the examples from academia but give a complementary view of what is being worked on in a more practical manner.
Demand pull
3.3. RESEARCH LAYOUT
The pull approach enables a technologically agnostic approach on solving the company’s needs, taking user-centric approach, rather than a technology-centric approach, helping to avoid shoehorning a technology to solve a problem. In this process, a mapping of the activities of a company is made and an inventory of pain-points faced by the company is made.
The pull approach consisted in interviews with experts in various domains of activities at Fortum:
◦ Hydropower (asset management, plant optimisation, scheduling)
◦ Data science (worked on various projects)
◦ Project management
◦ Digitalisation
◦ Energy trading
◦ Wind power
◦ New ventures
◦ Charge and Drive
In this phase, the use-cases were clustered according to their domain of application and their graph-related concepts and algorithms. This provides a detailed picture of which problems have previously been solved and with which approach. In this phase of the research, the question of where graph theory can be used is considered more important than which graph theory concepts are most useful. As such, the domains of applications are evaluated in the next step, in other words, a problem-centric approach was taken. A graph-centric evaluation approach could have also been possible, given their transferrability to multiple domains of applications. However, we believe that a problem-centric evaluation gives more value for practitioners since there is more organisational friction in applying a graph-based solution to various sectors than applying different graph-based algorithms within the same domain of application.
Figure 3.1 summarises the idea generation phase process.
3.3. RESEARCH LAYOUT
Figure 3.1: The idea generation phase
3.3.3 The evaluation phase
In this research, the selection of idea consisted of two successive steps: firstly, an application domain was prioritised and secondly, a specific use-case within the application domain was elected.
One of the key issues of an idea management programme is the selection of ideas from a large pool offering the biggest potential for future success of the
organisation [32]. To structure this high information load, suitable selection criteria are required. However, there is no single dominant set of criteria as every organisation has its own goals, needs and culture as well as individual budgets and timetables [33]. Therefore, the evaluation criteria are generally chosen by the organisation itself [30, 31]. The total score of an idea on these criteria indicates whether it should be accepted, deferred, or rejected.
In their guidelines, Gerlach et al. have assembled the evaluation criteria most
commonly proposed in IM literature [25]. They span a wide array of dimensions,
such as technology, organisational culture, strategy, business, etc. The evaluation
tool in this research consists of four dimensions: graph applicability, technical
3.3. RESEARCH LAYOUT
feasibility , economic potential and workability, whereby each of the dimensions further has four criteria, as per the table below. The clusters were scored from 0 to 2 on each of the criteria and the dimension score is the aggregate score of its underlying criteria.
Graph applicability Technical feasibility Economic potential Workability Underlying graph structure Simplicity of model Sector size Relevance
Richness of relationships Homogeneity of tools Sector growth Data alignment Identified concepts and algorithms Computational constraint Substitute tools Human alignment Availability of supporting use-cases Risk Scalability Integrability and maintainability