Design Optimization in Gas Turbines using Machine Learning : A study performed for Siemens Energy AB

(1)

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se

2021 | LIU-IDA/LITH-EX-A--2021/007--SE

Design Optimization in Gas

Tur-bines using Machine Learning

–

A study performed for Siemens Energy AB

Designoptimisering i gasturbiner med hjälp av maskininlärning

Mathias Berggren

Daniel Sonesson

Supervisor : George Osipov Examiner : Petru Elés

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

In this thesis, the authors investigate how machine learning can be utilized for speeding up the design optimization process of gas turbines. The Finite Element Analysis (FEA) steps of the design process are examined if they can be replaced with machine learning algo-rithms. The study is done using a component with given constraints that are provided by Siemens Energy AB. With this component, two approaches to using machine learning are tested. One utilizes design parameters, i.e. raw floating-point numbers, such as the height and width. The other technique uses a high dimensional mesh as input. It is concluded that using design parameters with surrogate models is a viable way of performing design optimization while mesh input is currently not. Results from using different amount of data samples are presented and evaluated.

(4)

Acknowledgments

We would like to express our gratitude to our supervisors at Siemens Energy AB, Olle Skrinjar and Erik Agermo for keeping daily contact with us. Since this thesis was conducted during a pandemic, most of the work was conducted from home. They helped us stay motivated to continue pushing forward.

We would also like to thank our examiner Petru Eles and supervisor George Osipov for giving good feedback during the project and keeping our spirits up.

(5)

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables viii

1 Introduction 1 1.1 Motivation . . . 1 1.2 Aim . . . 3 1.3 Research questions . . . 3 1.4 Delimitations . . . 3 2 Theory 4 2.1 Design Optimization . . . 5 2.1.1 Surrogate models . . . 5

2.2 Finite Element Method . . . 6

2.3 Machine Learning . . . 9

2.3.1 Sampling . . . 9

2.3.2 Tree models . . . 12

2.3.3 Kernel Methods . . . 12

2.3.4 Linear Regression . . . 13

2.3.5 Radial Basis Functions . . . 14

2.3.6 Kriging . . . 14 2.3.7 Neural Networks . . . 15 2.3.8 Feature selection . . . 16 2.3.9 Model Validation . . . 18 3 Method 19 3.1 Previous work . . . 19 3.2 Component . . . 22 3.2.1 Turbine disc SGT-700 . . . 22 3.2.2 HEEDS . . . 24 3.3 Mesh Approach . . . 25 3.3.1 Meshes . . . 25 3.3.2 Mesh representation . . . 25

3.3.3 Testing the mesh approach . . . 28

3.4 Parameter Approach . . . 29

3.4.1 Pre-study . . . 29

(6)

3.4.3 Evaluation . . . 33

4 Results 34 4.1 Dataset AS50 . . . 34

4.1.1 AS50 Single response . . . 34

4.1.2 AS50 Multiple responses . . . 35

4.2 Dataset AS60 . . . 36

4.3 Dataset AS80 . . . 37

4.4 Dataset AS100 . . . 39

4.5 Dataset AS135 . . . 40

4.6 Dataset AS180 . . . 42

4.7 Dataset AS245 . . . 43

4.8 Result comparison . . . 45 4.8.1 Area comparison . . . 45 4.8.2 Time comparison . . . 45 4.8.3 RMSE Life . . . 46 5 Discussion 47 5.1 Results . . . 47 5.2 Method . . . 48

5.2.1 Mesh Approach Discussion . . . 48

5.2.2 Parameter approach . . . 49

5.2.3 Future work . . . 50

5.3 The work in a wider context . . . 51

6 Conclusion 52

(7)

1.1 Picture of Siemens Gas Turbine 700 (SGT-700) . . . 1

1.2 The design cycle at Siemens Energy . . . 2

2.1 Strategies for optimization with surrogate models . . . 5

2.2 Sub-fields of Machine Learning with filled background on the specific fields this thesis uses. . . 9

2.3 LHS. a) has bad space-filling properties while b) has good space-filling properties . 11 2.4 An example structure of a Neural Network . . . 15

2.5 The model described by the yellow line overfits to training data . . . 18

3.1 Base component with parameter values, radiuses prefixed with R . . . 22

3.2 HEEDS processes used, as shown in the graphical interface of HEEDS. . . 24

3.3 Layer architecture of MeshNet . . . 26

3.4 Correlation of all parameters . . . 29

3.5 Boruta: The importance of each variable on response values . . . 30

3.6 Regions which surrogates tries to estimate crack initiation for. . . 32

4.1 Comparison of the area from the best designs found with every dataset . . . 45

4.2 Comparison of the RMSE Life value for the single response surrogate of each dataset 46 5.1 Spread of response variables Creep, Life and Area. . . 50

(8)

List of Tables

3.1 Design parameters for base component with available value ranges . . . 23 3.2 Files containing unique design information . . . 24 3.3 Table depicting the spread of training and test data points across different areas. . 28 3.4 Design parameters for component with available value ranges . . . 30 3.5 Data set name and how many points they are sampled from . . . 31 4.1 Optimal design found for AS50 with single response . . . 35 4.2 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 35 4.3 Optimal design found for AS50 with multiple response . . . 35 4.4 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 36 4.5 Optimal design found for AS60 with single response . . . 36 4.6 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 36 4.7 Optimal design found for AS60 with multiple responses . . . 37 4.8 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 37 4.9 Optimal design found for AS80 with single response . . . 38 4.10 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 38 4.11 Optimal design found for AS80 with multiple responses . . . 38 4.12 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 39 4.13 Optimal design found for AS100 with single response . . . 39 4.14 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 39 4.15 Optimal design found for AS100 with multiple responses . . . 40 4.16 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 40 4.17 Optimal design found for AS135 with single response . . . 41 4.18 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 41 4.19 Optimal design found for AS135 with multiple responses . . . 41 4.20 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 42 4.21 Optimal design found for AS180 with single response . . . 42 4.22 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 42 4.23 Optimal design found for AS180 with multiple responses . . . 43 4.24 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 43 4.25 Optimal design found for AS245 with single response . . . 44 4.26 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 44 4.27 Optimal design found for AS245 with multiple responses . . . 44 4.28 Algorithm and kernels used for the surrogates, as well as their RMSE values. . . . 45 4.29 Time taken to generate data . . . 45 4.30 Expected time for each dataset . . . 46

(9)

1.1 Motivation

Modern natural science is based on the idea that all phenomenons in nature, does not matter if electrical, mechanical or biological, can be described by the laws of physics. These laws are formulated as algebraic, differential or integral equations relating to different quantities of interest for a specific phenomenon. The fact that these laws are known enables us to sim-ulate how natural phenomenons and physical processes behave, and are used in our society for a variety of purposes. From predicting the occurrence of tornados and thunderstorms by simulating the weather, to estimating how long a turbine blade inside a SGT-700 gas turbine will last before it breaks.

(10)

1.1. Motivation

This project was conducted at Siemens Energy AB (SE), located in Finspång Sweden. SE are one of the leading manufacturers of industrial-grade gas turbines in the world. Gas turbines are machines constructed to produce a high amount of energy by burning natural gas to spin a turbine. These turbines are used to power big industries and factories, and are sometimes placed in remote locations, such as oil rigs or natural gas pipelines. They are complex ma-chines that are exposed to extreme temperatures and heavy loads. An example of such a machine can be seen in figure 1.1. It is key that SE constructs components that can handle heavy conditions without failure. Therefore, the design and constructing process consists of several virtual simulations to guarantee a high quality product. For those simulations, the Finite Element Method (FEM) is used.

Currently the design process at SE is an iterative loop of four different stages. Firstly the ge-ometry of the desired component is designed and a Computer-Aided Design (CAD) model is created based on the design. This model is the basis for the second stage, which is the thermodynamic simulation of the component. This simulation results in a thermo distribu-tion of the component. This distribudistribu-tion is used in the third stage, the mechanical integrity simulation. The fourth and last step uses the stress distribution from the previous simulation to estimated the expected life of the component.

Figure 1.2: The design cycle at Siemens Energy

There needs to be a model to perform the thermodynamic simulations, and there needs to be a thermodynamic model in order to perform the mechanical integrity simulations, and so on. It is in stages 2 and 3 that FEM is used, and while these are key to perform this kind of simulations, they are also a bottleneck for the cycle as they are computationally heavy. In the current cycle performing both a thermodynamic and a MI simulation can take, depend-ing on the complexity of the component, upwards of two weeks. As FEM scales exponentially with regards to asymptotic complexity, more nodes in a component gives longer simulations, see section 2.2. So if an error is discovered or if the expected life of the component is not good enough, the model has to be redesigned. This means that the cycle has to be restarted, the designer changes the geometry and the simulations are performed again.

As one design cycle can take up to two weeks, designing a new component could be a very time consuming task. Optimizing this design cycle would allow for more designs to be tested

(11)

and more flexibility to the engineers, possibly resulting in a better optimized product in a shorter amount of time. This could be done either by reducing the time it takes to perform FEA simulations, or by finding a way to gain more insights about the components and how different parameters of the geometry affect the end result.

During the last few years we have seen a rapid development in the machine learning field. The main reason for the rise in popularity and development is the enormous increase in computational power. Increased computational power allows processing of bigger data sets which in turn has allowed for more complex, and better performing machine learning mod-els.

1.2 Aim

The goal of this thesis is to establish ways for speeding up the design process for new com-ponents at Siemens with the help of various Machine Learning methods.

1.3 Research questions

1. As the goal is to speed up the design process by minimizing the time spent on FEA cal-culations, how can a machine learning prediction of the FEA be performed and utilized by a mechanical engineer at Siemens?

2. A design can be represented with different coarseness, using either a mesh or design parameters. Which way is most suitable for Siemens needs?

1.4 Delimitations

Achieving optimal performance is not of importance, instead the focus is going to be on creating a proof of concept.

No evaluation or reflection will be done on the theories on which our constraints for our given model rely on, as these assumes a solid understanding of Mechanical Engineering. As the thesis subject is in Computer Science we did not want to confuse readers with such information. For all matters that needed such expertise, assistance from engineers at Siemens was provided.

(12)

2 Theory

Due to global competition, manufacturing companies strive to produce products that are cheap and reliable. This has led to the invention of strategies to use when creating a product that satisfies specific criteria. Two usual, but conflicting, criteria at Siemens are to remove the mass of the component and time to when cracks start forming in the component due to me-chanical stress. A product can be described by parameters such as height, width etc. The task of choosing those values to create a satisfying product are referred to as design optimization. At Siemens this is commonly done by engineers using computational methods, such as Finite Element Analysis (FEA) and Computational Fluid Dynamics (CFD). Results from FEA and CFD calculations can be used to estimate how long a product will last before it breaks and needs to be replaced. From 2.2, it is clear that the FEM is a very computationally heavy and time consuming process, and waiting for such simulations is a bottleneck for engineers when designing new products at Siemens.

In this section the concept of design optimization is explained. This is followed by a broad introduction to the finite element method and machine learning techniques.

(13)

2.1 Design Optimization

The process of constructing a component and making it fit to its needs as well as possible is referred to as Design Optimization. The process can be described by a design D, consisting of design parameters x, which maximizes a performance metric f(x)and satisfies constraints g(x). Optimization of structures can be divided into shape optimization, and topology op-timization. Shape optimization focuses on changing parameter values in certain ranges to maximize f(x), a mathematical description can be seen in equation 2.1). Topology optimiza-tion focuses on placing material inside a bounding box to maximize f(x), therefor being more general [1].

maximize

xPD f(x)while g(x)ěconstraint and x lower

ďx ď xupper (2.1) The solution to a design optimization can be found through a manual iterative search guided by knowledge of senior engineers. There also exist algorithms which perform an iterative search trying to solve the optimization problem, some of the most common being Genetic Algorithms [2], Quadratic Programming [3], Particle Swarm Optimization [4] and Sherpa [5] * CHANGE HERE? Linear programming, integer linear programming, mixed linear pro-gramming. Mention heuristics * . Iterative processes demand that the previous iteration has finished before the next one can progress, slowing down the design process. At Siemens the design process consists of several cycles (figure 1.2), making an iterative search at each stage costly for large components.

2.1.1 Surrogate models

Surrogate models, also known as meta-models, approximation models or response surface models, have been extensively used the latest decades [6]. They are simplified versions of the object they have to imitate in the sense that the response function is cheaper to evaluate [7]. It is a machine learning technique, which means that it is a data-driven method in which the function to be estimated is not tried to be understood, instead focus is on creating an estimation function which does the same mapping from input to the correlated output [8].

Figure 2.1: Strategies for optimization with surrogate models

When a surrogate model is used in Design Optimization, the success of the whole process re-lies heavily on the accuracy of the model. In [2] it was concluded that there are three common strategies that are used when trying to achieve high accuracy for a surrogate model. In figure 2.1 an overview is presented. Both a) and b) focus on using reliable surrogate models for pre-dictions in the design optimization process. Alternative c) is not involving the surrogate in a typical optimization process. Instead the sampling alone is the optimization process, where the surrogate only acts as a guide and gives information about where to look for data points.

(14)

2.2. Finite Element Method

2.2 Finite Element Method

Scientists and engineers practicing in the field of computer-based physical simulations mainly work with two tasks:

• Mathematical formulation of a physical process • Numerical analysis of the mathematical model

The formulations are expressed in mathematical statements using the laws of physics. While the mathematical model is acquired through assumptions about the physical processes af-fecting the model, the numerical analysis is performed by a computer that estimates the characteristics of the process. Even though deriving the equations for most models is not impossible, solving them with high precision quickly becomes an advanced task. Because of this, approximate methods of analysis have been developed to provide solutions. The most used one today is called the Finite Element Method (FEM). [9]

FEM is a procedure which is used for simulating physical events. It is performed by calcu-lating estimations of partial differential equations for small parts of a structure. While the differential equations which operate on the full structure are a complex problem, for a finite amount of parts of the structure it becomes a feasible task to solve them. Therefor the struc-ture is divided into a finite amount of elements. These elements are connected to each other through a fixed amount of nodes, with the number of nodes and the structure of the elements varying between different problems. The nodes have several parameters which can have dif-ferent degrees of freedom (DOF) for difdif-ferent problems. What the FEM is calculating is the displacements that occur in each degree of freedom for each node. Since the FEM is based on a discretization of the full system, the end result becomes an estimation. The quality of an estimation can be controlled by the number of elements and the type of elements used. Increasing the number of elements or using more complex elements, increases the accuracy of the estimation, but also increases the number of computations required. [9, 10, 11]

(15)

The general form of the FEA process can be described in 7 steps. [10]

1. Simplify the original model by discretizing it into a finite amount of elements. 2. Choose a displacement function to be used within the elements.

3. Define the relationships between stress - strain and strain - displacement.

An one-dimensional deformation with strain exand displacement u can be expressed as following for small strains.

ex= du

dx (2.2)

The stress - strain relationship is where the material behaviour is defined. One of the simplest stress - strain relationships is called Hooke’s law. it entails the stress σxand the modulus of elasticity E:

σx=Eex (2.3)

4. Formulate the stiffness matrix and equations for each element. This tells the behaviour of each element and can be written as:

tf u= [k]tdu (2.4)

f and d are both vectors while k is a matrix, hence the notation. f keeps track of nodal forces for an element, k of the element stiffness information, and d of the displacements (for all DOF).

5. Introduce boundary conditions and gather all element equations into a matrix The final global equation can be written as:

tFu= [K]tdu (2.5)

where F is a vector with all nodal forces, K the total stiffness matrix for the whole model, and d is the vector of all nodal degrees of freedoms or generalized displacement. At this stage the global stiffness matrix is a singular matrix. To solve this issue, boundary conditions are imposed so that the structure remains in place. All applied loads have been accounted for in the global force matrix.

6. Solve for the unknown DOF for each element.

Eq. 2.5 is a set of algebraic equations that can be expanded to: $ ’ ’ ’ & ’ ’ ’ % F1 F2 .. . Fn , / / / . / / / -=      K11 K12 . . . K1n K21 K22 . . . K2n .. . ... ... ... Kn1 Kn2 . . . Knn      $ ’ ’ ’ & ’ ’ ’ % d1 d2 .. . dn , / / / . / / / -(2.6)

where n represents all the nodal degrees of freedom. 7. Solve for the element strains and stresses.

This is done by the equations 2.2 & 2.3 from step 3. In this example they are very basic, but the principle of how they are used generalizes to more advanced cases.

(16)

2.2. Finite Element Method

Based on earlier theory, there are some distinctive features that make FEM the most powerful numerical method that is used for analyzing complicated structures in engineering today [11]:

1. Can simulate complicated structures with different materials.

2. Ability to solve both geometrical (deformation) and material non-linearities.

3. Dividing the problem into a finite amount of unknown properties makes it solvable for computers.

The downside to the FEM is the time it takes to simulate a model. For a very complex model at Siemens the time it takes to run can be up to two weeks. Using asymptotic notation the complexity of the whole FEM algorithm is equal to O(NW2₎_{where N is the number of nodes,} and W is the bandwidth of the stiffness matrix. W is controlled by the number of elements times the nodal DOF. [12]

(17)

2.3 Machine Learning

Machine Learning (ML) is a sub field of Artificial Intelligence (AI). The technology is imple-mented in many applications and considered the state-of-the-art solution in e.g. speech-to-text translation, natural language translation and face recognition. The main reason for the rise in popularity of machine learning the recent years, is the enormous increase in compu-tational power which follows Moore’s law [13]. This allows processing of bigger data sets which in turn has allowed for more complex, and better performing machine learning mod-els. [14]

Machine learning is divided into three different categories that follow different approaches. A quick overview of relations between them and examples of techniques for each category can be seen in figure 2.2. Supervised learning is based on ML algorithms that learn from labeled data. Unsupervised learning tries to extract information about data sets. This can be done by dividing the data set into groups that share common characteristics. Consider an email data set in which the goal is to figure out what emails can be considered spam. A supervised algorithm requires that someone has explicitly marked all spam emails, whereas an unsupervised algorithm focuses on anomaly detection with no knowledge decoded into the data if it is a spam or not. Reinforcement learning is sometimes referred to as semi-supervised learning. The algorithm has a specified goal state it tries to achieve by choosing specific actions, which resembles supervised learning because of its use of labels. The actions to end up in the desired goal state are unknown and derived in an unsupervised way. [14, 15, 16]

In this thesis, the focus is on surrogate models which is a part of regression in supervised learning. The following section will explain how data can be collected, examples of common ML surrogates, choosing parameters and how to validate the models.

Figure 2.2: Sub-fields of Machine Learning with filled background on the specific fields this thesis uses.

2.3.1 Sampling

Design of experiments(DOE) is the design of any task that is given a set of input variables and tries to obtain a suitable variation of them to evaluate for the response function. In

(18)

2.3. Machine Learning

science this is widely used to create reliable and reproducible experiments by choosing for which values of the input variables the response should be evaluated. In ML this is often described as sampling or data generation. A set of input variables is in this section described as a data point. In this section some of the most commonly used sampling techniques are covered.

As the purpose of the model is to predict an unknown function, all information about it is in the generated data points. The model tries to mimic a black-box function, making it impossible to decide from what distribution data points should be picked and how many are needed [2]. Most scientists choose a sampling technique that is space-filling and tries to minimize the number of data points needed to achieve high accuracy for the surrogate [1, 2]. Although the addition of additional design variables scale the number of possible values exponentially, and the optimal number of sampling points is impossible to know beforehand, a small space-filling DOE can be generated with 10 data points for each input variable [17, 18]. A large DOE would require 30 data points for each input variable [19, 17, 20]. One way of trying to gain more knowledge about the function that is mimicked is to use an adaptive sampling approach [2, 21] where the predictions of the models affect upcoming sampling points. This can be implemented for alternative b) and c) in figure 2.1.

Level sampling methods

In design optimization, design variables are sometimes referred to as factors. One of the most basic methods for sampling, Factorial Sampling, samples a number of values from each factor depending on the Level. Given level l and number of parameters p, the sampling method generates lp designs. Ex: 2-level factorial sampling for 4 factors generates 24 ₌ ₁₆ data points, 3-level factorial sampling for the same 4 factors would generate 34 = 81 data points. The level describes how many values are examined for each factor. If the amount of sample points in a factorial is too large, Fractional Factorial Sampling is used. In fractional sampling, factorial sampling is performed for some of the factors, while the rest is varied according to a given pattern depending on values of factorial variables. Examples of patterns can be found in [22].

Taguchi is another DOE approach which is also referred to as Robust Parameter Design (RPD). The goal in RPD is to create products that are reliable and tolerant to noise factors. The sampling process uses similar patterns as in fractional factorial sampling which can be found in [22].

The Plackett-Burman method generates a p ˆ p orthogonal matrix, where p is the number of factors. It tries to find designs in which combinations of levels for any pair of factors p appear the same number of times. While a complete factorial design achieves this, the Plackett-Burman sampling uses less samples [22].

Low Discrepancy

There exist several sampling techniques which are categorized into low discrepancy sam-pling. One of the most common is the Hammersley point set. It is based on the Halton set which is in turn a generalisation of the van der Corput sequence into n-dimensions. The goal in low discrepancy sampling is to avoid alignments while distributing points uniformly in all dimensions [23]. Hammersley is often used as a replacement for random samples and is together with other discrepancy techniques referred to as quasi-random [24]. It is widely used thanks to its ease of implementation and efficiency at replacing random sampling. [25]

(19)

Latin Hypercube

Initially proposed in [26], Latin Hypercube Sampling (LHS) is one of the most common meth-ods today for space-filling designs [2]. When performing a LHS on p continuous design vari-ables, to achieve n samples, the range of values which are chosen for each variable is divided into n intervals. This creates a total of n ˆ k different intervals. A range for each design vari-able is chosen at random. After this, a value is chosen according to a defined probability distribution, (if only prior exists, assume a uniform distribution), inside each range for each variable. This is repeated until all requested sampling points have been collected. [26] A base LHS does not guarantee qualitative space-filling properties, see figure 2.3. This is why there exist several spin-offs to achieve better LHS. One of the most common methods is to maximize distances between data points in order to achieve more space-filling data [1].

Figure 2.3: LHS. a) has bad space-filling properties while b) has good space-filling properties

D-Optimal

In D-Optimal sampling, the sample algorithm is given a candidate set of sample values and tries to choose the optimal subset. The optimization criterion for which the subset is chosen is referred to as D-efficiency. D-efficiency is calculated from maximizing |X1_{X|, the determinant} of the information matrix X1_{X. It is generated for a specific model, which makes the model} type an input parameter to the sampling process. There are two situations when D-Optimal sampling is typically used: [22]

1. There are no resources to run a lot of samples

2. Certain values of the design parameters can not be sampled.

Central Composite

A central composite sampling set is a concatenation of three different sets:

1. A set consisting of a two level full factorial or two level fractional factorial samples. 2. A set of center points, median values of each factor used in the factorial set.

3. A set of axial points; all factors except one equal the center points; it will take values below and above the factorial median, typically outside the range.

This technique is often used when building a quadratic model and there is a need to avoid sampling a full three-level factorial design. [27, 28]

(20)

Adaptive Sampling

As observed in figure 2.1, some strategies when constructing surrogate models sample sev-eral times. The process of generating samples after each other is commonly referred to as

sequential sampling, and the idea is to gradually increase the size of the DOE while minimiz-ing the computational effort required [1]. In adaptive samplminimiz-ing, the fact that sample points have already been generated is utilized with a fitted surrogate to improve performance. First fit a surrogate to the currently generated data, then use it for directing new sample points to-wards areas that will increase surrogate accuracy [29]. Kriging as described in 2.3.6, provides an estimation of the certainty of a prediction. This can be used to generate new sample points in areas where high uncertainty is noticed [30]. For models that do not provide such a metric, cross validation can be used as a replacement [1].

2.3.2 Tree models

A tree model is a machine learning technique that is based on tree-shaped models. It is a long-established technique that can be used for both classification and regression tasks. Depending on the complexity and partitioning details of the tree, complex functions can be realised. A tree model has a root node as the starting point and traverses down the tree with conditional branches until it reaches a leaf node, which is the end point. Leaf nodes have values assigned to them which are the output values of the model. The model takes a vector x of length N as input which is mapped to an output value by traversing down the structure and the nodes. Each node can be seen as a pair(j, s)where j is the index of the input array and s is a threshold. Depending on the value of xjand the threshold either the left or right branch is chosen, until a leaf node is reached. [31]

Random forest

The idea to combine a large amount of weak performing classifiers to create a strong perform-ing classifier can be done in several ways. One option is to build a lot of models, and use the average output from all models. This is referred to as an ensemble. This idea is what the ran-dom forest technique is built upon, more specific a special version of the ensemble method referred to as bagging. The training part consists of building a lot of trees (creating a forest) from random subsets of features from the original data, and the prediction part consists of all the trees in the forest voting towards what should be the output. Random forest can perform both classification and regression just as a regular tree model. [32]

2.3.3 Kernel Methods

A kernel function K can be described as a similarity function. Given two objects as input, it will output a value indicating the similarity between them. The objects used as input to the kernel can be anything from two coordinate points, trees, integers, or other structures; the important thing is that the kernel function knows how to compare them [33]. The simplest form of a kernel is the linear kernel (equation 2.7). It takes two vectors as input and the sim-ilarity is measured by projecting a vector onto the other. The comparison in a kernel is done by calculating the inner product, i.e. a distance between the two objects. The calculations are performed in a different feature space that depends on what type of kernel is used. This can be useful e.g. when the goal is to classify with a linear classifier, the feasibility depending on the existing features. The problem can be solved by using a kernel, which calculates the inner product in a higher dimension. In this higher dimension it may exist a linear solution for the linear classifier. Below is a description of some of the most common kernels [34].

(21)

a scalar. [33]

k(xi, xj) =xi¨xj (2.7)

Gaussian kernelis a very common kernel in which the euclidean distance is taken from the input vectors xi, xjand then divided by a hyperparameter σ which defines the width of the kernel. For a reference, in a probability function this would be the variance. A normalization factor a is sometimes added in front of the kernel. However this requires knowledge about the underlying distribution. [34]

k(xi, xj) =a exp(

kxi´xjk2

2σ2 ) (2.8)

An exponential kernel is very similar to the gaussian, with the difference that the norm is not squared. This gives a different degree of smoothness [34].

k(xi, xj) =a exp(kxi´xjk

2σ2 ) (2.9)

A polynomial kernel is a matrix multiplication between two input vectors to look at values that consist of combinations of the input in a higher polynomial dimension [34]. The feature space in which the similarity is measured is equivalent to polynomial regression, with the advantage of avoiding the combinatorial blow up all learnable parameters cause [35].

k(xi, xj) = (xT_i xj)d (2.10)

Spherical kernelsare because of the if clause in the kernel function (equation 2.11), part of a collection of kernels referred to as compactly supported kernels. Compactly supported kernels share a characteristic known as a cut-off distance that is often referred to as a range. This can be very advantageous when working with big data sets to achieve more sparsity. [34] k(xi, xj) = # 1 ´3₂kxi´xjk Θ +12( kxi´xjk θ ) 3 _if_k_x i´xjkăΘ 0 otherwise (2.11)

2.3.4 Linear Regression

The linear regression method is one of the simplest ML methods when it comes to regres-sion analysis. The goal is to fit a linear equation to some given input data. The prediction can be expressed as the weighted sum of all parameters P on data point x, see equation 2.12. The fit is commonly achieved with the least squares method [36], but can be also done with ridge regression (L2-norm penalty) and lasso (L1-norm penalty) to achieve more bias and less variance [14]. The least squares method can also be used to fit various degrees of polynomial models [36] which are more flexible. A polynomial model is used according to ˆy(x) =mi+řni=0kiˆxdi. Where xiis a feature from the feature vector x, d is a chosen degree of the polynom and n is equal to the size of the feature vector x.

ˆy(x) =

P ÿ i=0

(22)

2.3.5 Radial Basis Functions

Radial Basis Functions (RBFs) are composed of two parts, the radial distance r, and radial basis functions φ(r, c). The radial distance for a point x to the data point xiis equal to ri(x) =

kx ´ xik. Radial distance is similar to how a kernel gives a similarity score. The distance of the point is used as a weight factor to determine the impact that a specific data point should have on the end result. RBFs can be thought of as blending functions, because a mixture of RBFs is used to interpolate data points. The notations in equation 2.13 are: φ(¨)is an arbitrary chosen radial basis function, c is a hyperparameter ą 0 used to scale output data, and ki is the radial basis coefficient for the data point xi. [21, 37]

ˆy(x) =

n ÿ i=0

kiφ(ri, c) (2.13)

Some common RBFs [21] are listed below.

Linear: φ(r, c) = rc Gaussian: φ(r, c) = exp(´(cr)2) Multiquadratic: φ(r, c) = b 1+ (cr)2 Inverse multiquadratic: φ(r, c) = a 1 1+ (cr)2 Thin plate spline: φ(r) = r2ln(r)

2.3.6 Kriging

Kriging is an interpolation method which serves as a special case of Gaussian processeses. A Gaussian process is a stochastic process in which all finite collections of stochastic variables have a multivariate normal distribution. This means that every finite linear combination of them is normally distributed. In kriging the interpolated values are fitted by a Gaussian process using a kernel as a distance metric (section 2.3.3). Provided with a suitable distance metric, kriging gives the best linear unbiased prediction. Unbiasedness can be achieved when the sum of all weights adds up to one. We can assure unbiasedness of the standard linear classifier from equation 2.12 by changing it into equation 2.14. In equation 2.14, we perform an unbiased prediction for a single data point x, given n data points y, with mean µ and weights w. ˆy(x) = n ÿ i=0 (wiˆyi) + (1 ´ n ÿ i=0 wi)ˆ µy (2.14)

If the mean is non-stationary, it is possible to use the residuals to achieve the same thing with a zero-centered mean. Using the residuals it is also possible to get a standard deviation at each point of interest. Constructing a confidence interval of this can be used as a certainty metric or quality of the prediction. Even though kriging guarantees the best linear unbiased prediction, it does not guarantee a good prediction, it requires spatial dependence to achieve interpolation better than the arithmetic mean. For a data point x, the kriging equation can be formulated as in equation 2.15 where w is a vector of regression coefficients, φ is a vector of regression functions, z(x)is a stochastic process with mean of zero and variance σ2. [38]

(23)

2.3.7 Neural Networks

The first mathematical model of a neuron was presented as early as 1943 in [39]. The earliest versions of artificial neurons were called perceptrons and were limited to fit linear functions [40]. Today, Deep Neural Networks (NN’s) are one of the best machine learning algorithms for fitting highly non-linear functions [14]. But as with all great things, there is no free lunch. Deep NN’s high ability to fit functions comes at the cost of difficulty to enforce known biases and gaining a conceptual understanding of which features affect which part of the decision making. This has led to them sometimes being used as a black box. However, the visuali-sation of Deep NN’s decision making process is a very active research area which has seen development in recent years [41, 42].

A Neural Network can be described as a directed acyclic graph, G = (V, E), with a weight function over the edges, w : E Ñ R. Each node in the graph corresponds to a neuron and each vertex a weight. Each node has an activation function σ :R Ñ R. The input into the ac-tivation function is the weighted sum of all input values received from vertices for a specific node. The weights are changed using gradients that are calculated for each prediction with a given loss function during the training phase of the algorithm. Deep learning refers to when neurons are placed into vectors, and then columns are added with neurons. The columns are referred to as layers. The layer at which the data is inserted is referred to as input layer, and the layer in which the result is received is called the output layer. All other layers between are called hidden layers, since they are not visible for an end-user of the model. Hidden lay-ers are used because they allow the NNs to compute complex functions at a lesser cost. By adding layers, log n neurons can be used to fit a function that would need 2n´1neurons with a single layer, this allows for less computations when adjusting gradients during the train-ing phase. Machine learntrain-ing algorithms ustrain-ing NN’s with more than one hidden layers are referred to as deep learning.

Figure 2.4: An example structure of a Neural Network

NN’s are based on ideas from how our brains work. A human brain consists of approximately 20 billion neurons, where each neuron is on average connected to 7000 other neurons [43]. While human brains are currently superior in the number of neurons they contain compared to modern NN’s, a computer has a switching speed of 10´10 seconds while human brains are estimated to perform a similar "switch" at 10´3seconds [44]. AI Pioneer Geoffrey Hinton claimed in [45] that he believes that "Deep learning is going to be able to do everything".

(24)

2.3.8 Feature selection

Feature selection, or variable screening, is the process of studying the importance each vari-able in a data set has on the end result. This knowledge can be used to determine which variables could be removed from the data set. This is done to break the Curse of dimensional-ity, which states that when the dimension of data increases, the search space for the algorithm grows so fast that the available data becomes sparse [46]. So by removing unimportant vari-ables the dimension is reduced, and thus the performance of the models increases. There exist a number of widely used techniques for this, some of which will be covered in this section.

One-factor-at-a-time

This method, as seen in the name, studies the effect of each variable by changing them one at a time. It is an inexpensive technique, but as it studies the variables separately it does not find any correlation with the interactions of the variables. To combat this, several techniques have been presented, one of which is the Morris Method [47]. This technique tries to categorize each variables’ effect into either: A. the effect is negliable, B. the effect is linear or additive or C. the effect is non-linear or interactive with another variable. The distribution of elementary effect for each variable is calculated using random one-factor-at-a-time simulations. The effect of variable i with input X can be determined as the following:

di(X) =

[y(x1, ..., xi´1, xi+∆, xi+1, ..., x_k)´y(X)]

∆ (2.16)

Where∆ is the change imposed on variable i between two simulations. The distribution of the elementary effect determines the category for each variable. A large mean indicates that the variable has a large linear effect on the end result, while a distribution with large spread indicates that the variable is either influenced by another variable or has a non-linear effect.

Correlation

Correlation is a measure of how two variables correlate to each other. Correlation can be measured with a correlation coefficient. One of the most common coefficients is the Pearson correlation coefficient. This coefficient measures the linear correlation between variables and ranges from ´1 to+1. Where 0 means that there is no correlation between the variables, ´1 means that there is a negative correlation, if one increases the other one decreases, and+1 indicates a positive correlation, if one increases the other one also increases. The Pearson coefficient, ρ, for variable X, Y is defined as:

ρX,Y = cov

(X, Y)

σXσY (2.17)

Where σ is the standard deviation of that variable and cov(X, Y) is the covariance of the variables defined as:

cov(X, Y) =µ[(X ´ µX)(Y ´ µY)] (2.18) Where µ is the mean value. This equation defines the mean value of the product of the vari-ables’ deviations from their own mean values.

When used in feature selection, the correlation coefficients between every variable and the re-spective response is calculated. The larger deviation from 0 the coefficient has, either positive or negative, the bigger significance the variable has.

(25)

Boruta

Boruta is a screening technique built around Random Forest, see section 2.3.2. Apart from classification, random forest can also be used as a screening technique. As each tree in the forest is trained on different subsets of features, the importance of features can be obtained either as the average and standard deviation of the accuracy loss in the classification, or by computing the Z score. The Z score is computed by dividing the average loss by its standard deviation. Boruta uses the Z score to obtain the importance of features, however the distribu-tion of the score returned by random forest is not N(0, 1)so it does not have a direct relation to the statistical significance of variables. This means that importance shown by the Z score could arise from random fluctuations. To combat this, Boruta uses an external reference to verify the importance of variables. This is done by adding random variables to the data set, called ’shadow’ variables. Using this extended data set, the importance of all variables is determined. [48]

The Boruta algorithm is presented as a 9 step process [48]:

1. Extend the data sets by adding random copies of all the variables, so called ’shadow’ variables. A minimum amount of 5 variables have to be added, even if the data contains less than 5 variables.

2. Shuffle the data, to remove any correlations with the response. 3. Run random forest on the data set, and gather all the Z scores.

4. Find the maximum Z score of the shadow variables (MZSA), and assign a hit to every variable that has a higher Z score than the MZSA.

5. For every variable that does not have a determined importance a two-sided test of qual-ity with the MZSA is performed.

6. Any variable which has an importance significantly (a specified confidence level) lower than MZSA is deemed as unimportant, and is permanently removed.

7. Any variable which has an importance significantly (a specified confidence level) higher than MZSA is deemed as important.

8. Remove all shadow variables.

9. This process is repeated until every variable has an importance, or until the algorithm reaches a pre-defined limit of random forest runs.

At the start of the algorithm the Z score can fluctuate a lot. This is due to the high amount of variables. Because of this the algorithm starts of with three initial rounds of random forest. During these initial rounds, the MZSA is taken from the 5th, 3rd and 2nd highest scoring shadow variables. During these runs, no variable can be deemed as significant and it is only on the the last iteration of random forest in these runs that variables are deemed unimporant and are removed. Doing this removes significantly unimportant variables, and yields less fluctuation in Z scores due to fewer variables. [48]

Any variable that has not been defined as either important or unimportant at the end of the algorithm is marked as tentative. If these cannot be defined by increasing the iterations in the algorithm, they can either be defined manually or by running a comparison of the median value of the Z score for that variable and the median value of the MZSA. [48]

The result of Boruta is presented as a plot, which shows the importance (Z score) of each variable, as well as the MZSA threshold.

(26)

2.3.9 Model Validation

Before using a machine learning model in a production environment, it is essential to confirm that it performs as expected. For this, the data set consisting of data points is split into at least two sets and sometimes three. The training set is the data used for training the model, the

test setis the data used to confirm that the model does not only perform well on data the model has already seen. The third set which is not always used is the validation set, it is used for fitting hyperparameters and avoiding overfitting. An example of overfitting can be seen in figure 2.5. A model’s fit can be described by the bias and variance tradeoff. Increase in bias leads to decrease in variance and vice versa. Too much variance will cause an overfit, while too much bias will cause an underfit. [14]

Figure 2.5: The model described by the yellow line overfits to training data

One of the most common use cases for tuning hyperparameters with the validation set is

cross validation. Cross validation splits the data into k chunks. One of the chunks is selected as the hold out set or inner test data set. This should not be confused with the test data set. The remaining k ´ 1 chunks are used for training the model. An error rate is derived from the hold out set by a prior decided error measurement. This process is repeated k times with a new chunk being selected as hold out set each time. The error rate is saved each time and then averaged over all k iterations. Cross validation can then be repeated for new hyperpa-rameter values. [14]

Root Mean Square Error (RMSE) is a frequently used error measurement. It measures the mean standard deviation and can be expressed as in equation 2.19 where di is the devia-tion. Another common error measurement is the Predicted Residual Error Sum of Squares PRESS=ř

(yi+ ˆyi)2. In both cases, a lower error value means that the model performs bet-ter. The value is then compared between different models to help choose the best performing one. [14] RMSE= d ř d2 i n (2.19)

(27)

This section will cover how the work performed in this thesis was carried out. The research questions presented two different approaches to solving this problem; one which studies the possibility of predicting the result of the FEA simulation using the mesh, and one which studies the possibility of predicting the worst value of an element in a mesh region relative to a given constraint by using the parameters of the design. This section will present the component given by Siemens to optimize, how data used in the project was generated, the work done with the mesh approach and the work done with the parameter approach.

3.1 Previous work

For us, the idea of bypassing the FEA simulations with ML-techniques originated from an article published in 2018, in which they propose a way to predict the stress distribution of an aortic wall by using a mesh of the aorta and deep learning [49]. This is done to circumvent the long simulation times that FEA has, which needs to be minimized in patient critical sit-uations. The presented technique starts by creating a mesh of the aortic wall. As an aorta is a cylinder, these meshes can be seen as rectangles if they are cut along the longitudinal direction. In order to create the data that is used for the deep learning, the authors use a statistical shape model (SSM). A SSM is a statistical representation of different shapes and their variations. Here it is used to create a data-set of 729 anatomically correct aorta walls, using a sample of a few real ones. To get a lower amount of input dimensions, the meshes are encoded using Principal Component Analysis (PCA). In the PCA the mean shape from the SSM and the eigenvectors and eigenvalues from its covariance matrix are used. The encoded meshes are used in a Neural Network that predicts the stress distribution of the mesh. As this stress distribution is in a reduced state, it is decoded to show the stress distribution of the original mesh.

As the data used in this thesis does not originate from a SSM, the PCA method to represent a mesh used in [49] is not directly applicable. But the idea of reducing the information of a mesh to use it in a machine learning algorithm and then expanding the result is. There ex-ists a wide variety of different techniques to represent geometric shapes. One technique that is widely used is to use deep learning to extract features from a shape, to then be used for

(28)

3.1. Previous work

further tasks such as classification or segmentation. One paper published in 2015 presents 3D ShapeNets [50]. This network represents a shape as a binary mesh of voxels. A voxel can be seen as a 3D-pixel in a 3D-grid. If the voxel value is 1, the shape is within that voxel, if it is 0 it is not. The experiments in the article use a voxel grid of 30x30x30, i.e. fairly low resolution. This article also constructs ModelNet, which is a large set of 3D CAD models that are used for training. This set contains 151, 128 different models, in 660 different categories. The categories are ordinary items, such as chairs, tables, airplanes and more. A reduced set of this, called ModelNet40, containing 12, 311 models from the 40 most common categories is also created. Using this reduced set 3D ShapeNets was able to obtain a classification accuracy of „ 77%. Another way to represent a 3D shape is point cloud, which are points in a volume space. An article published in 2015 presents PointNet [51]. This is also a deep learning net-work, that consumes point clouds and represents them so that they can be used for further tasks, such as classification. The article states that each point in the cloud is a vector con-sisting of coordinates(X, Y, Z)and other features, such as color or normal vector. However, in the article only coordinates are used in the experiments. This network is also trained and tested on the ModelNet40 data set from [50], and achieves a classification accuracy of „ 86%. In [52] another deep learning network used to represent 3D is presented. This article presents MeshNet, which uses a mesh representation as input, and derives the geometrical shape of it to a numerical representation. This is done by convolution of every element in the mesh, gathering information about its shape and position. In the article the network is tested and validated on a classification problem using the ModelNet40 [50] collections. Using this data set the network achieves „ 90% accuracy when classifying the labels of the 3D-models. The above mentioned articles all present ways of representing 3D shapes using deep learning networks. The networks are designed and presented to perform classification tasks, but [52, 51] state that once the global feature of the shape has been extracted, the final task of classi-fication could be changed. In this project this would be replaced with a regression task that predicts the stress distribution of the FEA simulation, much like the idea presented in [49]. The classification problems are done on models with high variance, as there is a significant difference between the shape of an airplane and the shape of a chair. The models used in this project do not have as high variance, as they are modifications of the same underlying shape. Therefore, the classification accuracy of the networks presented are not of great importance, but rather what input data they use and how they extract features from it.

In [53] a shape optimization of a saw blade with three objective functions and five parame-ters is performed. Three of the parameparame-ters are geometrical (clearance angle, rake angle, tool cutting edge radius) and two are process parameters (feed rate and cutting speed). Two of the objective functions are obtained from FEA, tool wear depth and interface temperatures. A framework [54] is used for creating the different metamodels. To obtain data for training the surrogates, LHS is used to create a DOE consisting of 100 data points. In total seven dif-ferent surrogate-models are used, RBF with a prior, ordinary RBF, Response Surface Method (RSM), Kriging, Support Vector Regression (SVR), NN and Multivariate Adaptive Regres-sion (MARS). They use Normalized root mean squared error (NRMSE) and Rank Error (RE) as error functions, with cross-validation as validation method. After surrogates are fitted and evaluated to objective functions, the best performing models are RBF and Kriging. The different kernel functions (correlation functions) tested for RBF are Linear, Cubic, Gaussian and Multiquadratic, while Kriging is tested with Exponential, Gaussian, Linear, Spherical and Cubic kernels. A genetic search algorithm is used for optimization. Two different op-timizations are performed, in one of them the search algorithm is used with the basic FEA functions, in the other optimization the search algorithm is used with the previously trained surrogates. 200 simulations are used for finding the optimized design with the search al-gorithm. The conclusion of the authors is that using surrogates with design parameters for

(29)

shape optimization gives time reduction and better solutions than optimization using only a search algorithm.

Another approach to speed up the FEA calculations with machine learning was presented in [55]. A NN was trained and integrated into Abaqus to predict the material behaviour of a component. All tests are performed on the same geometrical values for the component while the different loads are varied. This would be useful when several separately opti-mized components are chained together and a verification of the whole system using FEA is performed. The NN model is proven to accurately predict the material behaviour and an im-pressive speedup is gained from the applied model. However, required data is estimated to be a 4-level factorial sampling scheme. This becomes unfeasible when working with models with more than 4 loads as the number of data points exceeds 1000 and therefore introduces limitations. Testing is only performed on 2D and the author states that introducing another dimension would probably require more training data.

In [56] a comparison of six different surrogate-models is performed. 8 different mathematical functions with varying inputs from 2 to 12 are modelled with nine different data set sizes starting with 32 data points but increasing exponentially up to 8192. The data is generated from a low-discrepancy sequences sampling schema. The training data is split into training and validation set with 50% of the data in each set. The test set is pre-generated and consist of 500 data points. The different surrogate models are Linear regression, SVR, MARS, NN, Ran-dom Forest and Gaussian Process Regression (Kriging). The authors conclude that the default hyper parameter values seldom yield good results and put an emphasis on hyperparameter optimization. Kriging gave the best results followed by NN and MARS. NN proved more stable for large datasets while Kriging became slow and sometimes even failed to converge for data sets larger than 4096 data points. SVR also had the same problem and also appeared to need more configuration with hyperparameters to achieve satisfactory accuracy. Kriging managed to achieve great accuracy for all functions using between 32 and 256 training points and also had few hyperparameters that needed optimization.

(30)

3.2. Component

3.2 Component

This section will present the component which is used in the project; what it is, which cal-culations are performed, what these calcal-culations have as input and output, and what we are trying to optimize.

3.2.1 Turbine disc SGT-700

As the authors of this thesis expertise lies in computer science and not mechanical construc-tion, a component from Siemens with given base values will be used. We are encouraged by Siemens to use our lack of knowledge about which parameter values are feasible on the design to not infer incorrect bias. The component that was used in this thesis is a cross section 2D-model of a turbine disc. Rotating the turbine disc is what generates the energy in a gas turbine. Fitted to these discs are rotors, that are propelled by the combustion of natural gas to rotate the turbine.

Using the computer aided design program NX CAD, a component was created and parame-terized. Figure 3.1 shows the component, design parameters and values of the baseline com-ponent. These parameters determine the way the component can be changed and shaped between designs. A baseline of the parameters and their boundaries were selected with guid-ance of engineers at Siemens, in order to enable as much variation in the designs as possible while still guaranteeing valid designs.

Figure 3.1: Base component with parameter values, radiuses prefixed with R

(31)

Number Variable Min Baseline Max

1. Rim Temperature 400 500 600

2. Left Neck Width 1 3 6

3. Left Mount Top Radius 5 8 12

4. Left Mount Width 60 72 80

5. Left Mount Bot Radius 10 15 20

6. Left Neck Top Radius 100 110 120

7. Left Arm Top Radius 5 10 15

8. Left Arm Height 33 43 53

9. Left Arm Width 70 80 90

10. Left Arm Bot Radius 30 35 40

11. Left Waist Top Radius 30 37 45

12. Left Waist Bot Radius 40 45 50

13. Left Bot Width 45 50 60

14. Right Mount Width 30 36.5 46.5

15. Right Mount Bot Radius 3 8 12

16. Right Neck Width 1 3 6

17. Right Arm Top Radius 5 10 15

18. Right Arm Bot Radius 30 35 40

19. Right Arm Height 33 43 53

20. Right Arm Width 70 80 90

21. Right Waist Top Radius 30 37 45

22. Right Waist Bot Radius 40 45 50

23. Right Bot Width 45 50 60

Table 3.1: Design parameters for base component with available value ranges

There were two constraints which the component had to satisfy. The first constraint was the number of start and stop cycles that the component could withstand before crack initiation. This was derived through low cycle fatigue (LCF) calculations. This value had to be bigger than 4000, and will in the text be described as "Life" or "Eval". The second constraint is the amount of crack initiation that is allowed to happen inside the component before running for 240 000 hours. This value had to be lower than 6%, and will in the text be referenced as "Creep". Both of these are calculated through FEA simulations in Abaqus. During the simulations an external load is added to the top of the component, simulating the weight of the rotor blade. There is also a thermal equilibrium calculation of the component. The objective of the design optimization is to minimize the weight of the component. As the material is homogeneous this is equal to minimizing the area.

For one component the files seen in table 3.2 are created, in the process of calculating the response values. While there are several other files used for scripting or processing the files, these are the unique files that are required to be created for each design.

(32)

3.2. Component

File Contains Estimated Filesize

base.prt Parameterized base model 500 KB

base_i.prt Parameterized base model 50 KB

Design.prt Parameterized Design 500 KB

Design_i.prt Parameterized Design 50 KB

Design.fem Raw Mesh 1 500 KB

Design.sim Mesh with marked Regions 3 000 KB

exporter.inp Input Mesh into Abaqus 600 KB

Design_Thermal.odb Thermal FEA Output 1000 KB

Design_MI.odb MI FEA Output 2 000 KB

Design_Creep.odb Creep FEA Output 2 000 KB

Design_Eval.odb Life FEA Output 2 000 KB

Table 3.2: Files containing unique design information

3.2.2 HEEDS

HEEDS is an automation program developed by Siemens, that is used to automate design generation or optimization. The program works as a customizable pipeline of different pro-cesses. A process is a task such as generating the CAD-model or doing FEA simulations on it. In this thesis HEEDS was used to automate the design generation. A pipeline of four processes was set up. The first process creates new CAD-models based on the design pa-rameters and values specified in table 3.1. The second process creates a new mesh from this CAD-model. The third process uses this mesh to perform the FEA simulations, both the thermodynamic equilibrium and the mechanical integrity simulations and also performs the Creep estimation. The forth and last process performs the life evaluation of the component.

Figure 3.2: HEEDS processes used, as shown in the graphical interface of HEEDS.

HEEDS has support for a lot of extra features, such as cloud computing, different sampling techniques and surrogate models. Cloud computing enables calculations and processes to be performed on remote machines. This was used on processes 3 and 4, to enable the FEA simulations and life evaluations to be done on a remote computational server available at Siemens. The surrogate model feature was used to create and evaluate the surrogate models used in later sections.

(33)

3.3 Mesh Approach

The mesh approach studies the possibility of bypassing the FEA simulation, which is the time consuming task in the design cycle, see figure 1.2, by using a ML-solution to predict the stress distribution instead. The FEA simulations, see section 2.2, take the mesh of the model and simulate how it would behave under certain conditions, such as applying loads or external forces to it. The simulation result is the stresses and strains that are placed on the model with these conditions. When using a mesh, the stress for every element is calculated. This result is then evaluated to calculate the life expectancy of the component. This gives a detailed view of the possible weak points the component has and work can be done to strengthen those areas to make it last longer. By using this approach the stress distribution of the whole component would be predicted. As mentioned in the previous work this idea has been tested before, but it is not a widespread technique, and to the authors’ knowledge no studies exist with the same preconditions as in this thesis. The approach has some problems, as the amount of nodes and elements in a mesh can be very large, both the input space and output space of the machine learning algorithm can also be very big. This increases the complexity of the algorithm and increases the amount of data needed to train it. But if a good fit can be found and the predictions are correct, this would give more information to the engineer than the parameter approach while still providing the same speedup.

3.3.1 Meshes

Meshes are a way of representing 3- or 2-Dimensional shapes in a digital environment. They are heavily used in fields such as computer vision and computer graphics. In this project meshes are used as a way to represent the shape of the component used in the Finite Element Simulations. Meshes are built out of points in a coordinate space, called nodes. Two nodes that are connected forms an edge. Three or more connected edges make an element. If the element has three edges it is referred to as a triangular element. The number of nodes and the size of elements is different for each mesh, as they vary in size, detail and complexity. At Siemens it is not uncommon for meshes to contain upwards of two million nodes. A mesh is a good way of representing a shape in a very detailed way, which is why it is used a lot with FEA.

3.3.2 Mesh representation

As presented in the previous work, this idea has been done previously by reducing the mesh information to a state where it can be used in machine learning [49]. The original state of a mesh contains information about the coordinate position of every node, how many elements there are and which nodes each element contains. This makes it a very high-dimensional problem making it hard to predict the output. Of the different techniques found to handle this problem of representation the one tested in this thesis is MeshNet [52]. 3D ShapeNet [50], which represented shapes by using a voxel grid, was not selected because using voxels loses too much detail of the shape. We believe that PointNet [51] could also be a viable option. However, as the model given by Siemens is represented as a mesh, and as the desired out-put also is a mesh representation, MeshNet was deemed as most suitable. A more elaborate exploration about how MeshNet works follows in this section.

MeshNet

MeshNet is a neural network architecture presented in [52]. It is proposed as a way of repre-senting 3D-meshes in a numerical way while still preserving information about the geometry and shape. The proposed way to achieve this is to regard the elements of a mesh as the main units. The location of an element, what it looks like and which its neighbouring elements