Test and Assessment of Derivative Computation Architectures using OpenMDAO and its Application in a Real Airfoil Optimization Problem

(1)

Test and Assessment of Derivative Computation Architectures using OpenMDAO and its Application in a Real Airfoil Optimization Problem

XIN SHI

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)

(3)

OpenMDAO and its Application in a Real Airfoil Optimization Problem

XIN SHI

Degree Projects in Scientific Computing (30 ECTS credits)

Degree Programme in Applied and Computational Mathematics (120 credits) KTH Royal Institute of Technology year 2018

Supervisor at KTH: Lilit Axner Examiner at KTH: Michael Hanke

(4)

TRITA-SCI-GRU 2018:392 MAT-E 2018:83

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

Computation Architectures using OpenMDAO and its Application in a Real Airfoil Optimization Problem

XIN SHI

Degree Projects in Scientific Computing (30 ECTS credits)

Degree Programme in Applied and Computational Mathematics (120 credits) KTH Royal Institute of Technology year 2018

Supervisor: Lilit Axner Examiner: Michael Hanke

(6)

(7)

Abstract

Optimization problems are widespread in everyday life and engineering science. In the engineering optimization domain, the problems are usually modelled with an objective function. To solve these kinds of optimization problems, there are two main classes of optimization algorithms that are used: gradient-based algorithms and gradient-free algorithms. We will focus on gradient-based algorithms where the computation of derivatives is a critical step in the process. In this thesis, we present five different methods for computing the derivatives. These are the finite-differences method (FD), the complex-step method (CS), the automatic-differentiation method (AD), and two analytical methods – the direct and adjoint methods. We demonstrate the procedures involved in these methods in a test case and show their implementation using NASA’s Open source framework for Multidis- ciplinary Analysis and Optimization (OpenMDAO). Finally, we test and assess their performance in OpenMDAO by modelling a real airfoil problem.

(8)

(9)

Sammanfattning

Test och bedömning av derivatberäkningsarkitekturer som använ- der OpenMDAO och dess tillämpning i ett verkligt problem för flygplansoptimering

Optimeringsproblemen är utbredda i vardagsliv och ingenjörsveten- skap. I ingenjörsoptimeringsteknik modelleras problemen vanligtvis med en objektiv funktion. För att lösa dessa slags ingenjörsoptime- ringsproblem finns två huvudkategorier av optimeringsalgoritmer: gradientbaserade algoritmer och gradientfria algoritmer. Vi kommer att fokusera på gradientbaserade algoritmer där beräkningen av derivatan är ett kritiskt steg i processen. I denna avhandling presenterar vi fem olika metoder för att beräkna derivatan. Dessa är så kallade Finita differensmetoden (FD), komplexa stegmetoden (CS), den automatiska differentieringsmetoden (AD) och två analytiska metoder - direkt och så kallade adjoint - metoder. Vi demonstrerar hela proceduren med dessa metoder på testfall och visar deras genomförande med hjälp av NASA:s mjukvara Open Source Framework MultiDisciplinary Analy- sis and Optimization (OpenMDAO). Slutligen testar och utvärderar vi deras prestanda i OpenMDAO genom att modellera ett verkligt problem med flygblad.

(10)

(11)

I thank Prof. Michael Hanke for his help during my studies in the master programme of Scientific Computing. I am also grateful to him as an examiner for my master thesis work.

I thank my supervisors, Dr. Lilit Axner, who played an important role in arranging and monitoring the procedures of this thesis project.

I thank the technical adviser in PDC, Dr. Gong Jing, who helped me to operate on high performance computers with his expertise.

I thank the expert in Airinnova, Dr. Mengmeng Zhang, who gave me this precious opportunity to apply the knowledge into real projects.

I thank my parents for supporting me and encouraging me all the time.

v

(12)

(13)

1 Introduction 1

2 Theoretical Background 4

2.1 Finite-difference method . . . 4

2.2 Complex-step method . . . 6

2.3 Automatic-differentiation method . . . 7

2.4 Analytic methods . . . 8

2.4.1 Direct method . . . 9

2.4.2 Adjoint method . . . 10

3 Algorithmic Implementation 13 3.1 FD . . . 14

3.2 CS . . . 14

3.3 AD . . . 15

3.4 Analytical Method . . . 17

3.4.1 Direct Method . . . 17

3.4.2 Adjoint Method . . . 18

3.5 Convergence Analysis of All Five Methods . . . 19

4 OpenMDAO 21 4.1 Unified mathematical formulation . . . 21

4.1.1 Monolithic formulation . . . 22

4.1.2 Derivative computation . . . 23

4.2 Data Structures . . . 24

4.2.1 Component . . . 24

4.2.2 Group . . . 27

4.2.3 Problem . . . 28

4.3 Derivatives calculation methods . . . 29

4.3.1 FD and CS . . . 29

4.3.2 Analytical methods . . . 30

vi

(14)

5 Benchmarks and Results 32 5.1 Computer Architecture Background and Parallelization

Technique . . . 32

5.1.1 Parallelized linear system solver . . . 34

5.1.2 Paralleled FD . . . 35

5.2 Optimizing the thickness distribution of a cantilever beam 36 5.2.1 Problem definition . . . 37

5.2.2 Result . . . 39

5.3 Optimization of the correction shape of an airfoil based on the beam elements . . . 40

5.3.1 Problem . . . 40

5.3.2 Result . . . 43

5.4 Optimizing the shape of the airfoil based on the plane frame elements . . . 45

5.4.1 Problem . . . 45

5.4.2 Result . . . 47

5.5 Optimizing the lift coefficient of the airfoil . . . 49

5.5.1 Problem . . . 49

5.5.2 Result . . . 51

5.6 Performance measurements . . . 53

5.6.1 Time complexity for computing derivatives . . . . 53

5.6.2 Parallel linear solver . . . 54

5.6.3 FD for vortex panel method . . . 55

6 Conclusion 56

7 Future Work-plan Improvement for Parallelization 58 A Application of Euler–Bernoulli Beam Theory onto a cantilever

beam 67

(15)

Introduction

Throughout the development of human civilization, we have been continuously dealing with optimization problems. For example, Eu- clid of Alexandria (⇠325 BC to ⇠265 BC) proved that the square has the greatest area amongst all the possible rectangles with the same given total length of the edges [1]. Heron of Alexandria (⇠10 AC to

⇠75 AC) showed that the shortest path between two points is the path that light travels between them [1]. Ibn Sahl (940AC to 1000 AC) used the law of refraction, later called Snell’s law, to compute the optimum shapes for lenses and curved mirrors — this may well have been the first application of optimization in the engineering domain [1].

In the most general case, an engineering problem can be modelled with independent variables x and output functions F (x), where f = F (x)are called the output variables [2]. The optimization in the model (1.1) aims to minimize the values of F (x) by varying the values of x from within an allowed set of values, which is called the feasible domain Dx [2]

minx F (x)

subject to x 2 D^x (1.1)

In the context of engineering optimization, there are two main classes of optimization algorithms, namely gradient-free algorithms and gradient- based algorithms [3].

Gradient-free algorithms, or derivative-free algorithms, require only the values of F (x), rather than the computation details for them [4].

Compared with gradient-based algorithms, gradient-free algorithms may find a better local minimum, and possibly even the global min-

1

(16)

imum, by exploring wider and wider ranges for the feasible domain [5]. However, the time complexity of this kind of algorithm is unac- ceptable - the convergence rate of gradient-free algorithms varies ex- ponentially with respect to the number of input variables nx. Usually, if confronted with hundreds of input variables, using gradient-free algorithms is not feasible [5].

Gradient-based algorithms require the derivatives of f with respect to x, i.e., ^df_dx [6]. Gradient-based algorithms might find the minimum of F (x) by directionally exploring the feasible domain [7]. The convergence rate of gradient-based algorithms varies linearly (or better) with respect to nx [8]. Some optimization problems in the model (1.2) may additionally contain state variables y and the governing equation R(x, y) = 0 which shows the relationship between x and y [2]

minx F (x, y(x))

subject to R(x, y) = 0, x 2 Dx

(1.2)

In this situation, it is desirable to have ^dy_dx to precondition ^df_dx [3]. How- ever, gradient-based algorithms only converge to the local minimum [9]. More strictly, all the functions that are involved have to be continuous and differentiable at the minimum [10].

In aerodynamic shape optimizations, we assume that the larger the number of input variables in the model, the more accurate the result- ing shape will be. So, the number of input variables is crucial for the time complexity of the algorithms. Such problems can usually be modelled with continuous and differentiable functions [11]. Moreover, the situation where the function has multiple local minima has not been encountered [12]. Thus, we assume that the local minimum is the global minimum. For these reasons, we will focus on gradient-based algorithms.

Computing derivatives is an essential step in almost all gradient- based algorithms [6]. Feasibility, accuracy, time complexity and implementation time are the four main criteria used for choosing appropri- ate methods. The feasibililty of the method decides whether or not the method can be applied to the problem in question. The accuracy of the derivatives is crucial in gradient-based optimization to ensure that the algorithm is robust and that it will converge, especially for problems with large numbers of constraints. The optimizer may be influenced by inaccurate derivatives and thus become more expensive due to the

(17)

increase in the number of the iterations [13]. The time complexity is critical in engineering optimization because the fidelity of the model usually depends on the size of the input variables of the model. The implementation time should also be considered for real problem cases.

However, we have to find a suitable balance between these four criteria (feasibililty, accuracy, time complexity and implementation time) when choosing which method to use.

So far there are five methods that are widely-used to compute derivatives in these contexts. These are finite-differences (FD), complex-step (CS), automatic differentiation (AD), and two analytical methods – direct and adjoint method. Although numerous studies have applied partic- ular methods to concrete problems, comparisons between these five methods have rarely been undertaken. Different methods have different scenarios for which they are suitable; thus, it is of special interest to compare and summarize these methods.

In this thesis, we compare the five methods for computing derivatives in terms of the four criteria by theoretical analysis and through practical examples. In addition, we refer to OpenMDAO, an open source high-performance computing platform for multidisciplinary analysis and optimization [14] which combines FD and CS plus two analytical methods into its optimization framework in order to optimize the shape of an airfoil. Note that OpenMDAO does not contain an implementation of the AD method, which is consequently not included in our study of OpenMDAO.

In Chapter 2, we present the theoretical background of these five methods. In Chapter 3, the implementation details of these methods for a test case are presented. In Chapter 4, a unified mathematical formulation for computing total derivatives in OpenMDO is introduced using the same test case. In Chapter 5, we take a real case and model the shape of an airfoil as a beam and do the optimization in Open- MDAO. In Chapter 6, we summarize our conclusion. In Chapter 7, we give advice for further work by planning improvements of the FD method taking advantage of the architecture of parallel systems.

(18)

Theoretical Background

Before specific implementations and algorithmic optimizations are described, it is important to give a description of the mathematical background of the problem being considered. The goal is described in the beginning. From Section 2.1 to Section 2.4, we characterize the five computational methods that are involved.

In general, a computational model (1.1) takes one or more inputs and produces the correponding outputs. In this regard, we need to compute the derivatives of the output functions F with respect to the input variables x, which yields a nf ⇥ nx Jacobian matrix in Equation (2.1) [15].

J = @F

@x :=

2 66 66 4

@F1

@x1

@F1

@x2 · · · _@x^@F_nx¹

@F2

@x1

@F2

@x2 · · · _@x^@F_nx² ... ... ... ...

@F_nf

@x1

@F_nf

@x2 · · · ^@F_@x^nf_nx 3 77 77

5 (2.1)

We assume that the computational model is deterministic [16], and it is implemented in Python.

2.1 Finite-difference method

The finite-difference method (FD) is the method most commonly used for computing derivatives [15]. The finite-difference formula is derived by transforming the Taylor series expansions (2.2) [15] where x are the input variables, F (·) are the output functions, h is the stepsize

4

(19)

and ej is the j^thstardard basis vector.

F (x + hej) = F (x) + h@F (x)

@xj

+ h² 2

@²F (x)

@x²_j + O(h³) (2.2) By retaining only the first three terms in Equation (2.2), we obtain

@F

@xj

= F (x + hej) F (x)

h + O(h) (2.3)

As shown in Equation (2.3), after perturbing h of the value of one of input variables xj, the difference between the original output F (x) and the re-evalueted output F (x + hej)is then calculated; finally, the derivative _@x^@F_j is approximated by dividing the difference by the value of the perturbation

• Feasibility

FD does not require the details of function computation. Even if the function evaluation is a black box, FD can be applied and FD is the only choice [17].

• Implementation time

Since FD does not require any interaction with the details of a model, it is the simplest method which does not need additional implementation time. It means that FD can accommodate future additions to any framework without modification [18]. This might be the main reason that FD is used by most gradient-based optimizers as the default to compute gradients [3].

• Accuracy

There are two main source of errors. The first is the truncation error with O(h) [15]. This means that the error converges proportionally with respect to the magnitude of perturbation h if the simplest FD formula, Equation (2.3), is applied. Thus, it is expected to decrease h as much as possible. However, this may lead to the second source of error - subtraction cancellation in the numerator of Equation (2.3) [13]. Imagining that we can only reach four decimals precision, if the difference caused by the perturbation is as small as O(10 ⁵), for example, the result of the derivative will always be 0. Usually, the inaccuracies in the derivatives due to subtractive cancellation are the culprit in cases where gradient-based optimizers fail to converge [3].

(20)

• Time complexity

FD requires to calculate the output functions at the initial point x and the perturbed point x + hej which results in one column of the Jacobian matrix in Equation (2.1). Consequently, the computational cost for the entire Jacobian matrix increases linearly with respect to the number of input variables nx. When the computational cost of the simulations is influential and the model has a large number of input variables, the computation of the derivatives usually becomes the bottleneck in the optimization cycle.

2.2 Complex-step method

The complex-step method (CS) originated with the work of Lyness and Moler [19]. However, it was only later on that this theory was exca- vated by Squire and Trapp [20]. By pass the concept of the complex number, CS approximates the derivatives of real functions [21].

In a manner silmilar to that with FD, CS is derived using the Taylor series expansion in Equation (2.4) [21]. Rather than using a real step h, a purely complex step ih is used in a Taylor series at a given point x as follows [21].

F (x + ihej) = F (x) + ih@F (x)

@xj

h² 2

@²F (x)

@x²_j

ih³ 6

@³F (x)

@x³_j +· · · (2.4) By taking the imaginary parts of both sides of Equation (2.4),

Im[F (x + ihej)] = Im[F (x) + ih@F (x)

@xj

h² 2

@²F (x)

@x²_j

ih³ 6

@³F (x)

@x³_j +· · · ] Im[F (x + ihej)] = h@F (x)

@xj

h³ 6

@³F (x)

@x³_j +· · · and dividing it by h, we obtain,

@F

@xj

= Im[F (x + ihej)]

h + O(h²) (2.5)

Note that F need to be real-valued.

• Feasibility

CS needs access to the source code for the function F , because sometimes operations on complex numbers are not embedded

(21)

in the computer language. To implement CS, the source code is required to be modified so that all real variables and operations are upgraded to support the calculation of complex numbers.

The only thing that needs to be do is to modify the code to support operstions on complex numbers, which is not an issue in MATLAB or Python.

• Accuracy

There is only truncation error with O(h²)in the complex-step approximation (2.5), which does not introduce the subtraction operation. As a result, the truncation error can be reduced to same order as the machine numerical percision using a small enough h[21].

• Time complexity

As with the FD method, each individual evaluation in the CS method results in a column of the Jacobian matrix in Equation (2.1), and the cost for computing the jacobian matrix in Equation (2.1) is proportional to nx. Thus CS is also not suitable for models with vast numbers of input variables [21].

2.3 Automatic-differentiation method

The automatic-differentiation method (AD), which is also known as algorithmic differentiation, relies on tools that automatically produce scripts for computing user-specified derivatives based on the original source code [22]. This will automatically generate the code of the function to compute the derivatives [23]. Based on scanning the source code line by line, AD automates the symbolic differentiation [23]. Fi- nally, it will automatically generate the code for the function for computing the derivatives as how we derive the analytic derivatives manually. The fundamental principle is the decomposition of the derivative computation provided by the chain rule [23].

For example, according to the chain rule, the derivatives of F in Equation (2.6) is decomposited into Equation (2.7).

F = F (G(H(x))) = F (G(w1)) = F (w2) (2.6)

(22)

dF

dx = dF dw2

dw2

dw1

dx (2.7)

• Feasibility

Many engineering methods cannot written explicitly in closed form, relying stead on iterative numerical procedures expressed as computer programs, i.e. AD [3]. However, the language and the structure of implementation code of the function are constrained by the computation tools. The form of the function is also constrained by the computation tools. The real application of AD is strictly confined.

As long as one has a reliable AD tool, the only thing that is left is to reconstruct the source code according to its rule.

• Accuracy

AD automates the step we can do manually so that the only source of error is the machine error.

• Time complexity

AD requires more computational time than FD [13]. The computational cost of AD varies linearly with respect to nx.

2.4 Analytic methods

Analytic methods are based on the linearization of the numerical model in Equation (1.1) [24]. Analytic methods are applicable when problems are modelled as Equation (1.2) which we restate here [24],

minx F (x, y(x))

subject to R(x, y) = 0, x 2 D^x (2.8) The governing equations

R(x, y) = 0 (2.9)

(23)

describe how the values of the state variables y are related to the input variables x [13]. The residuals r is defined in Equation (2.10).

r = R(x, y) (2.10)

By using the chain rule, the total derivative of f is decomposited as [24]

df

dx = @F

@x +@F

@y dy

dx (2.11)

Combining Equation (2.9) and (2.10) and using the chain rule, we obtain Equation (2.12)

dr

dx = @R

@x +@R

@y dy

dx = 0 (2.12)

The partial derivatives can be computed using the methods described earlier (i.e. FD, CS, and AD) [3]. As a result, the numerical precision is the same as that of these embedded methods [3]. The computation cost for the Jacobian matrix ^dy_dx in Equation (2.12) accounts for a large pro- portion of the total computation cost, since the computation requires the solution of the governing equations.

Rearranging the linearized residual equations (2.12), we can compute the total derivative matrix ^dy_dx by solving the linear system,

@R

@y dy

dx = @R

@x (2.13)

Substituting the result ^dy_dx of Equation (2.13) into Equation (2.11) [24], df

dx = @F

@x

@F

@y

@R

@y

1

| {z }

T

@R

@x (2.14)

Note that it is not necessary to compute the inverse of ^@R_@y in Equation (2.14) explicitly.

There are two methods to get the derivatives - the direct method and the adjoint method - they are distinguished by how the computation of the total derivatives is transformed.

2.4.1 Direct method

The direct method solves the ^dy_dx in the system Equation (2.13). Then the result ^dy_dx is substituted into Equation (2.11). Finally we use FD, CS or AD to compute ^df_dx [3].

(24)

• Feasibility

The direct method needs the source code of all the functions that are involved. So the method cannot be applied to black-box functions. The governing equation is needed as well.

The direct methods are usually more efficient than AD for a given problem when the formulas for the analytic derivatives can be easily derived manually. However, after the derivative functions are derived, it is necessary to translate them into the code. This requires a thorough understanding of the computatinal model as well as a long implementation time. Consequently, analytic methods need more involvement than FD, CS and AD.

• Accuracy

The numerical accuracy of the direct method is the same as that of the embedded methods which are used to compute the involved derivatives.

• Time complexity

The direct method is advantageous when the computational model is nonlinear [13]. It is also advantageous when we have a factor- ization of @R/@y , since the solution of the linear system would then consist of an inexpensive back substitution [13]. However, in the intermediate procedure, nx right hand sides of the linear system in Equation (2.13) needs to be computed. Consequently, the cost for computing derivatives using the direct method is proportional to nx. It is not suitable for large-scale problems.

2.4.2 Adjoint method

The adjoint method for computing derivatives has been using for over three decades. It was first used to solve optimal control problems and then to analysis the sensitivity of linear structural finite element models [25]. The difference when it is compared with the direct method is the transformation of the linear system, which we call the adjoint equations (2.16)[24].

(25)

In the linear system (2.15), ^@F_@y is regarded as the right hand side and ^T is the unknown which is called the adjoint vector.

T = @F

@y

@R

@y

1

(2.15) This results in the adjoint equations [24]

[@R

@y]^T =

@F

@y

T (2.16)

The result of can then be substituted into Equation (2.14) df

dx = @F

@x + ^T@R

@x (2.17)

FD, CS or even AD can be applied to compute the partial derivatives in Equation (2.16) and (2.17) [3].

• Feasibility

The adjoint method needs the source code of all the involved functions. So it cannot be applied to black-box functions. The governing equation is needed as well.

It is the same as the direct method in this regard.

• Accuracy[24]

As with the direct method, the numerical precision is the same as that of the embedded methods which are used to compute the involved derivatives.

• Time complexity

The adjoint method solving Equation (2.16) is proportional to the number of output vaiables nf. So, the cost of computing the Ja- cobian matrix in Equation (2.1) using the adjoint method is proportional to nf instead of depending on nx. The direct method performs better than the adjoint method when nx < nf and vice versa [26]. It is extremely suitable in the aerodynamic optimization problem. The first application of this method to fluid dy- namics problems was in work by Pironneau [27]. Then Jameson

(26)

[11] used this method to optimize the shape of airfoil, and was later extended to optimize the aerodynamic shape of entire air- craft configurations [28, 29, 30], and the shape considering both aerodynamics and structures [18, 31] which are known as the multidisciplinary systems [13].

(27)

Algorithmic Implementation

To illustrate the algorithmic implementation of the methods described in the earlier sections, we introduce a test case in the beginning of Chapter 3. From Section 3.1 to Section 3.4, we apply the five derivation methods to the test case to demonstrate their usage. In Section 3.5, we present the results of the convergence.

The test case can be fitted into the model (1.2) which we restate here minx F (x, y(x))

subject to R(x, y) = 0, x 2 D^x

The problem is modelled with output functions F , a vector of three explicit functions

F =

 F1(x1, x2, y1, y2) F2(x1, x2, y1, y2) =

 x1y1cos(x1)

y2sin(x1) (3.1) and the governing equations R

R =

 R1(x1, x2, y1, y2) R2(x1, x2, y1, y2) =

 y1 2y2 cos(x1) y1+ x2y2

=

 0

0 (3.2)

can be viewed as the constraints on the model, or another disciplinary system that is embedded in the model, depending on what is needed.

This test case has two input variables x = [x1, x2]^T , and two state variables y = [y1, y2]^T . Since the governing equations (3.2) are linear with respect to the state variables in this problem, the following linear system is solved

 1 2 1 x₂

 y1

y₂ =

 cos(x‘)

0 =)

 y1

y₂ = 1 2 + x₂

 x2cos(x1) cos(x₁)

(3.3)

13

(28)

Specifically, we set the objective of this case as being the computation of the derivative ^dF_dx₁¹ at the starting point x =

 x1

x2 =

 _⇡

4

1 , i.e. the upper-left entry in the Jacobian matrix in Equation (3.4)

J = dF dx =

 _dF₁

dx1

dF2

dx2

dF2

dx1

dF2

dx2

(3.4) The exact derivative ^dF_dx¹₁ at x =⇥_⇡

4, 1⇤

is manually computed as

@F1

@x1

(⇡

4, 1) = x2(cos²(x1) 2x1sin(x1) cos(x1))

(2 + x2) |{x1=^⇡₄,x2=1} = 2 ⇡ 12(3.5)

3.1 FD

The things that need to be defined are the function itself and the corresponding stepsize h.

Algorithm 1 the finite-differences method

Input: Function F (x), pertubation h, starting point x⁰ Output: Jacobian matrix JF D

for i in 1 to nf do for j in 1 to nxdo

JF D[i, j] = ^Fⁱ^(x⁰^+he_h^j^{) F}ⁱ^(x⁰⁾ end for

end for return JF D

In the test case, the FD approximation of Equation (3.5) is

@F1

@x1

(⇡

4, 1) ⇡ JF D[⇡

4, 1] = F1(x1+ h, x2) F1(x1, x2)

h |{x1=^⇡₄,x2=1}

3.2 CS

Martins et al. [21] showed that CS is applicable to almost all algorithms, and they built a general frame for its implementation. CS requires the computer program to support complex number computation [21]. In most cases, one has to modify the source code of the given

(29)

computational model to satisfy this requirement, although this varies depending on the programming language. Moreover, some functions that are involved need to be changed. Martins et al. [21] provided a Fortan script that helps to implement CS, as well as procedures for implementing CS in Matlab, C/C++, and Python.

Algorithm 2 the complex-steps method

Input: Function F (x) and its source code, perturbation h, starting point x⁰

Output: Jacobian matrix JCS

JCS[i, j] = ^Im(Fⁱ^(x_h⁰^+ihe^j⁾⁾ end for

end for return JCS

In the test case, the CS approximation of Equation (3.5) is

@F₁

@x1

(⇡

4, 1) ⇡ JCS[⇡

4, 1] = Im[F1(x₁+ ih, x₂)]

h |{x1=^⇡₄,x2=1}

3.3 AD

One of two main approaches to AD is the source-code transformation (SCT). SCT generates the code that computes the derivatives based on the original code [32]. The generation of the additional code is automatically done using an AD tool [32]. The other approach to AD is operator overloading. This approach keeps the original code while rede- fines the operation and types of variables [33]. The type of a real number v is extended with the corresponding derivative, i.e., ¯v = (v, dv) [33]. All operations are extended with the computation of the derivatives of those operations [33].

Martins et al. [21] also presented another way of understanding CS – connected CS with the operator overloading approach. The complex number consists of the original real number, and the complex part of each number corresponds to its derivative [3]. All the complex number operations result in the derivatives in the complex parts of the numbers. Theoreticially, if the step size is small enough, CS and AD yield the same results with finite-precision arithmetic [3].

(30)

Algorithm 3 the source code transformation Automatic Derivatives method

Input: Function F (x) and its source code, starting point x⁰ Output: Jacobian matrix JAD

automatically derive the function Fixj = _dx^dFⁱ

j

J_AD[i, j] = F_ix_j(x⁰) end for

end for return JAD

We will use Tangent, an automatic differentiation tool using source code transformation for Python[34]. The developer of Tangent says

“Tangent performs ahead-of-time automatic differentiation on the Python source code itself, and produces Python source code as its output. This approach to automatic differentiation is different from existing pack- ages popular in machine learning, such as TensorFlow and Autograd.

Advantages are that Tangent generates gradient code in Python which is readable by the user, easy to understand and debug, and has no run- time overhead. Tangent is the first SCT-Based AD system for Python;

it occupies a unique point in the space of tradeoffs between usability, flexibility, debuggability, and computational performance. Tangent makes it possible to choose different trade-offs between usability and ease of debugging in comparison to prior systems. ” [34].

In the test case, the first step is to define the model as a series of functions.

importnumpy as np def Y1Y2(x1,x2):

det = 1.0/(x2+2)

y1=det ⇤ x2⇤np.cos(x1) y2=det ⇤ ( np.cos(x1)) return y1,y2

def f1(x1,x2):

y1,y2 =Y1Y2(x1,x2) return y1⇤x1⇤np.cos(x1) def f2(x1,x2):

y1, y2 =Y1Y2(x1,x2)

(31)

return y2⇤np.sin(x1)

Then we encapsulate and apply Tangent to the functions importtangent

importnumpy as np

class AutomaticDifference(Equation):

def __init__(self) :

self.method= "Automatic Difference"

pass

def __repr__(self) :

return super(AutomaticDifference,self).__repr__() + "\n ,! method = " +self.method

def grad(self, funct, wrt=(0,), verbose=0):

"""

:param funct: top level function hander

:param wrt: tuple, define the desired interest , default is ,! first (0)

:param verbose: output the source code of computing ,! derivatives

:return: df dwrt

"""

derivatives =tangent.grad(funct, wrt=wrt,verbose=verbose ,! )

return derivatives ad=AutomaticDifference() addf=ad.grad(f1, wrt=(0,)) addf(np.pi, 1.0)

Then Tangent will automatically generate the code for the function

df1

dx as addf. Finally, the result will be computed using the last line.

3.4 Analytical Method

3.4.1 Direct Method

We now apply the direct method to the model problem. Since there are just two state variables, Equation (2.13) corresponds to the following

(32)

linear system,

" _@R

1

@y1

@R1

@y2

@R2

@y1

@R2

@y2

# " _dy

1

dx1

dy1

dx2

dy2

dx1

dy2

dx2

#

=

 _@R₁

@x1

@R1

@x2

@R2

@x1

@R2

@x2

In this problem, the governing equations R have simplicit formations.

We compute ^@R_@y and ^@R_@x manually to obtain

 1 2 1 x2

" _dy

1

dx1

dy1

dx2

dy2

dx1

dy2

dx2

#

=

 sin(x1) 0

0 y2

Note that in more sophisticated cases, the computation of the partial derivatives would be more involved, since the expressions of governing equations are usually implicit and consequently we might need to apply FD or CS to approximate the partial derivatives. However, only the first column is needed to compute _dx^df¹₁ as shown in Equation (3.6).

The system is evaluated at

 x1

x₂ =

 _⇡

4

1 and

 y1

y₂ =

" p 2 6p2

6

# .

df1

dx1

= @F1

@x1

+@F1

@y1

dy1

dx1

+ @F1

@y2

dy2

dx1

= @F1

@x1

+h

@F1

@y1

@F1

@y2

i"

dy1

dx1

dy2

dx1

# (3.6)

The result of the values for _dx^dy¹₁ and ^dy_dx²₁ are substituded into Equation (3.6). We also evaluate the results of ^@F_@x¹₁, ^@F_@y¹

1, ^@F_@y¹

2 either by hand or using FD or CS. Finally, Equation (3.6) are simplified to

dF1

dx1

= (1 6

⇡ 24) +

p2 8 ⇡dy1

dx1

3.4.2 Adjoint Method

We now apply the adjoint method to the model problem. Since there are just two state variables, Equation (2.16) leads to the following linear system:

" _@R

1

@y1

@R1

@y2

@R2

@y1

@R2

@y2

#T 

11 12

21 22 =

" _@F

1

@y1

@F1

@y2

@F2

@y1

@F2

@y2

#T

(3.7)

We can compute^@R_@y manually, and the second of the adjoint matrix are not required since we assume that only f1 is the interest in this case,

(33)

consequently the linear system (3.7) becomes

 1 1

2 x2



11

21 =

 x1cos(x1) 0 We solve this linear system to obtain the value of



11

21 . The system is evaluated at

 x1

x2 =

 _⇡

4

1 and

 y1

y2 =

" p 2 6p2

6

#

. We compute

df1

dx1 via Equation (2.17) which results in the following equation:

dF₁ dx1

= @F₁

@x1

+ 11

@R₁

@x1

+ 21

@R₂

@x1

where we can compute the partial derivatives manually to obtain dF1

dx1

= (1 6

⇡ 24) +

p2 2 ¹¹

3.5 Convergence Analysis of All Five Meth- ods

The relative error is the difference between the exact value of the function and the corresponding result from the numerical method. The exact value of ^df_dx is computed by Equation (3.5).

For FD and CD, by varying the perturbation h, we obtain Figure 3.1.

(34)

Figure 3.1: The convergence of FD and CS

In Figure 3.1, the x-axis corresponds to the perturbation h while the y-axis is the results of relative errors. It can be seen from the red line in Figure 3.1 that the problem of subtractive cancellation arises in FD. For FD, when h is larger than 10 ⁸, the error decreases while it increases when h is smaller than 10 ⁸. When h becomes lower than 10 ¹⁴, the result of subtraction operation becomes zero, in which case the relative error becomes one. The slope of the red line is one which is consistent with the first-order FD. while the slope of the blue line is two which confirms that CS has second-order convergence.

As for AD and the two analytic methods, if double precision is used, the error comes from the machine error ⇠ O(10 ¹⁵).

(35)

OpenMDAO

OpenMDAO is an open source code writting in Python for multidisciplinary analysis and optimization [35, 14]. The reason we use Open- MDAO is that it integrates gradient-based optimization with analytic derivatives as well as FD and CS. OpenMDAO unifies these four methods into its derivative computation framework and it conducts the optimization algorithms thoroughly. In Section 4.1, we briefly review the theoretical background of OpenMDAO namely the Modular And Unified Derivative (MAUD). In Section 4.2, we review the main Open- MDAO data structures that we used in this thesis. In Section 4.3, we demonstrate the process of OpenMDAO with respect to our test case described in Chapter 3. Note that OpenMDAO does not implement AD. Martins et al. [13] presented the mathematical procedures for computing derivatives by using all five methods based on MAUD.

4.1 Unified mathematical formulation

The numerical model in Equation (1.2) includes the input variables x and their explicit or implicit definitions via governing equations (2.9).

To be consistent, define

x = 0 B@

x1

...

x_n_x 1 CA , y =

0 B@

y1

...

y_n_y 1 CA , f =

0 B@

f1

...

f_n_f 1 CA

as the input, state and output variables and define the respective func-

21

(36)

tions

Y = 0 B@

Y1(x) ...

Yny(x) 1 CA , F =

0 B@

F1(x) ...

Fn_f(x) 1 CA

as the state functions and output functions [3]

y1 = Y1(x1, ..., xnx, y2, ..., yny 1), ...

yny = Yny 1(x1, ..., xm, y1, ..., yny 1), f1 = F1(x1, ..., xnx, y1, ..., yny),

...

fnf = Fnf(x1, ..., xnx, y1, ..., yny)

(4.1)

where some of the state variables also have a residual function Rk(x, y) = 0associated with them.

4.1.1 Monolithic formulation

The first step is to concatenate the set of input, state and output variables into a single vector

v = (v1, ..., vn)^T = (x1, ..., xnx, y1, ..., yny, f1, ..., fnf)^T (4.2) where n = nx+ny+nf. We assume that x^⇤_kis the value of input variable xk for all k = 1, ..., nx at the point at which the numerical model is being evaluated. The corresponding residual functions R(v) are given

(37)

by Equation (4.3) [3]

R1(v) = x1 x^⇤₁, ...

Rnx(v) = xnx x^⇤_n_x, Rnx+1(v) =

(y1 Y1(x1, ..., xnx, y2, ..., yny) , y1 is explicitly defined R1(x1, ..., xnx, y1, ..., yny) , y1 is implicitly defined, ...

Rnx+ny(v) =

(y_p Y_p(x₁, ..., x_n_x, y₁, ..., y_n_y ₁) , y_n_y is explicitly defined Rny(x1, ..., xnx, y1, ..., yny) , yny is implicitly defined, Rnx+ny+1(v) = f1 F1(x1, ..., xm, y1, ..., yny)

...

Rnx+ny+nf(v) = fnf Fnf(x1, ..., xm, y1, ..., yny)

(4.3) The numerical model in Equation (1.2) can be written more compactly as,

R(v) = 0() 8>

><

>>

:

R₁(v₁, ..., v_n) = 0 ...

Rn(v1, ..., vn) = 0

(4.4)

which we call the fundamental system. The solution v^⇤to the fundamental system (4.4) satisfies the numerical model in Equation (4.1) [3].

4.1.2 Derivative computation

The fundamental system (4.4) is rewritten as

v = 0

@ x y f

1

A and R(v) = 0

@

x x^⇤ R(x, y) f F (x, y)

1 A

As it is proposed by Martins et al. [3], if ^@R_@v is invertible and the inverse is defines as

[@R

@v ] ¹ = 2 4

A^xx A^xy A^xf A^yx A^yy A^yf A^{f x} A^{f y} A^{f f}

3 5

(38)

then the derivatives

df

dx = A^{f x}

where ^@R_@v is evaluated at v^⇤satisfying R(v^⇤) = 0. The inverse function theorem guarantees the existence of a locally defined inverse function R ¹ : r 7! v|R(v) = r that satisfies

@(R ¹)

@r = [@R

@v] ¹ (4.5)

Martins et al. [3] defines that given the algebraic system of equations R(v) = 0, assume ^@R_@v is invertible at the solution of this system.

The matrix of total derivatives dv/dr is defined to be dv

dr = @(R ¹)

@r where ^@(R_@r¹⁾ is evaluated at r = 0.

And from Equation (4.5), dv

dr = [@R

@v] ¹ yielding

@R

@v dv

dr = I = @R

@v

Tdv dr

T

(4.6) which we refer to as the unifying chain rule equation where R, v, r are defined in Equation (4.3), (4.2), and (4.5).

4.2 Data Structures

There are several core data structures in computing derivatives in Open- MDAO that are defined in classes. We illustrate via the test case in Chapter 3 and it can be applied to similar models. The classes of these data structures we use are Component, Group, Problem which we describe below.

4.2.1 Component

Componentis the smallest unit of computational work the framework understands [14]. Each component will output its own set of variables [14]. The most frequently used subclass of Component are

(39)

1. IndepVarComp – independent variable component x,

2. ImplicitComponent – implicitly (computed dependent variable) component y(if we do not solve the linear system of Equa- tion (3.3)), and

3. ExplicitComponent – explicitly (computed dependent variable) component f.

In the test case, the relation between x, y, f together with their type is implemented as that in Table 4.1.

Component Inputs Outputs

1 IndepVarComp x

2 ImplicitComponent x y

3 ExplicitComponent x, y f

Table 4.1: The relation between x, y, f in the test case

Independent variables IndepVarComp are set externally to the model.

Therefore, they are inputs of the model. From the perspective of a Component, independent variables are component outputst that do not depend on any component inputs [14], i.e., there is no need to implement add_input() method for a IndepVarComp . From the perspective of a model, they can be viewed as input variables set by the user, before the user runs the model [14].

To define an IndepVarComp, the user has to implement add_output().

For example, x is an IndepVarComp in the test case. We define x in the following way,

# ThesisDemos.py

fromopenmdao.apiimportProblem,Group,IndepVarComp ...

class ThesisDemos(Group):

def setup(self) :

indeps= self.add_subsystem(’indeps’,IndepVarComp()) indeps.add_output(’x’,val=[np.pi,1.0])

...

(40)

Implicit variables ImplicitComponent are those that are computed via implicit functions of other variables [14].

To define an ImplicitComponent subclass, two methods have to be implemented:

1. def setup(self) in which the user should implement (a) self.add_input(name,val=) to add input variables, (b) self.add_output(name,val= to add output variables,

and

(c) self.declare_partials(of=, wrt=, method=) to define the behavior of the computation of derivatives;

2. apply_nonlinear(self, inputs, outputs, residuals) which defines the residual functions in Equation (4.3).

In our test case, y is an ImplicitComponent and it is implemented in the following way,

# AllComp.py

fromopenmdao.apiimportImplicitComponent class YImplicitComp(ImplicitComponent):

def setup(self) :

self.add_input(’x’, val=[np.pi,1.0]) self.add_output(’y’, val=[1.0,1.0])

self.declare_partials( ’⇤’ , ’⇤’ ,method=’fd’) def apply_nonlinear(self, inputs, outputs, residuals) :

x1, x2 =inputs[’x’]

y1, y2=outputs[’y’]

residuals[ ’y’ ][0] = y1 2⇤y2 np.cos(x1) residuals[ ’y’ ][1] = y1+x2⇤y2

To define an ExplicitComponent subclass, two methods have to be implemented [14].

1. def setup(self) in which the user has should do the same things as those in ImplicitComponent.

2. def compute(self, inputs, outputs) defines the explicit formula of outputs with respect to the inputs.

(41)

In our test case, f = F (x, y(x)) is an ExplicitComponent which is implemented in the following way,

# AllComp.py

fromopenmdao.apiimportExplicitComponent class FComp(ExplicitComponent):

def setup(self) :

self.add_input(’x’,val=[np.pi,1.0]) self.add_input(’y’, val =[1.0,1.0]) self.add_output(’f’, val =[1.0,1.0])

self.declare_partials( ’⇤’ , ’⇤’ , method=’fd’) def compute(self, inputs, outputs):

y1, y2= inputs[’y’]

x1, x2 = inputs[’x’]

outputs[’f ’ ][0] = x1⇤y1⇤np.cos(x1) outputs[’f ’ ][1] = y2⇤np.sin(x1)

4.2.2 Group

With Group objects, we can represent a complex model as a set of components [14]. In Group objects, components and other groups can be grouped together to form a tree structure [14].

To define a Group subclass, def setup(self) has to be defined where the user should implement [14]

1. add_subsystem() which collects a component and returns a system object,

2. connect() which connects the outputs of one system to the inputs of another system.

For our test case, the input variables x, the state variables y, output functions f and the connection between them are defined in the following way,

# ThesisDemos.py

fromopenmdao.apiimportGroup

(42)

class ThesisDemos(Group):

def setup(self) :

# add input component x

indeps= self.add_subsystem(’indeps’,IndepVarComp()) indeps.add_output(’x’,val =[1.0,1.0])

# add state component y

self.add_subsystem(’y’,YImplicitComp())

# add output component f

self.add_subsystem(’f’,FComp())

# connect the outputs to the inputs self.connect(’indeps.x’, [ ’ f .x’ , ’y.x’ ]) self.connect(’y.y’ , [ ’ f .y’ ])

4.2.3 Problem

Problemis the object to [14]

1. integrate the groups and set the (initial) value of independent variables,

2. set the solver and its behavior, 3. perform the calculation, and

4. track the internal values and the results.

For the test case, if we are interested in the total derivatives^dF_dx in Equa- tion (3.4), it can be done in the following way,

# ThesisDemos.py

fromopenmdao.apiimportProblem,NewtonSolver,ScipyKrylov

...

# 1 integrate the groups and set the value of independent variables prob=Problem()

model=prob.model=ThesisDemos() prob[’indeps.x’] = [np.pi,1.0]

(43)

# 2 set linear / nonlinear solver for the ImplicitComponent model.nonlinear_solver=NewtonSolver()

model.nonlinear_solver.options[’solve_subsystems’] =True model.nonlinear_solver.options[’max_sub_solves’] = 1 model.linear_solver=ScipyKrylov()

# fix the problem setting prob.setup()

# 3 perform the calculation task prob.run_model()

# 4 track the internal values and the results print(prob[’y.y’ ])

print(prob.compute_totals(of=[’f.f ’ ],wrt=[’indeps.x’]) )

4.3 Derivatives calculation methods

We show how to implement our case using OpenMDAO in this section. Note that OpenMDAO does not implement AD, so we skip AD.

4.3.1 FD and CS

In the definition of the subclass of Component, to make the OpenM- DAO use FD or CS to compute the derivatives, the user has to declare in the setup() method

# AllComps.py ...

class FComp(ExplicitComponent):

def setup(self) : ...

# use finite different

self.declare_partials( ’⇤’ , ’⇤’ , method=’fd’)

# or use complex step

self.declare_partials( ’⇤’ , ’⇤’ , method=’cs’) ...

(44)

4.3.2 Analytical methods

In order to to make OpenMDAO use analytical methods, the user has to define the following two methods with explicit formulas or input the values for the corresponding derivatives

1. in compute_partial() method for the ExplicitComponent subclass.

2. in linearize() optional method for the ImplicitComponent subclass.

For instance, for the test case, we have done in the following way,

# AllComps.py ...

class YImplicitUnifiedComp(ImplicitComponent):

def setup(self) :

self.add_input(’x’, val=[1.0,1.0]) self.add_output(’y’, val=[1.0,1.0]) self.declare_partials( ’⇤’ , ’⇤’)

def apply_nonlinear(self, inputs, outputs, residuals) : x1, x2 =inputs[’x’]

y1, y2=outputs[’y’]

residuals[ ’y’ ][0] = y1 2⇤y2 np.cos(x1) residuals[ ’y’ ][1] = y1+x2 ⇤ y2

def linearize(self, inputs, outputs, partials) : x1,x2 =inputs[’x’]

y1,y2=outputs[’y’]

det = 1/(x2+2)

partials[ ’y’ , ’x’] = [[ det⇤x2⇤sin(x1), det⇤⇤2⇤2⇤np.cos(x1) ,! ], [det⇤np.sin(x1), det⇤⇤2⇤np.cos(x1)]]

partials[ ’y’ , ’y’] = [[1, 2], [1, x2]]

class FUnifiedComp(ExplicitComponent):

def setup(self) :

self.add_input(’x’,val=[1.0,1.0])

(45)

self.add_input(’y’, val =[1.0,1.0]) self.add_output(’f’, val =[1.0,1.0,1.0]) self.declare_partials( ’⇤’ , ’⇤’)

def compute(self, inputs, outputs):

y1, y2= inputs[’y’]

outputs[’f ’ ][0] = x1⇤y1⇤np.cos(x1) outputs[’f ’ ][1] = y2⇤np.cos(x1)

def compute_partials(self, inputs, partials) : y1, y2= inputs[’y’]

partials[ ’ f ’ , ’x’] = [[y1⇤(np.cos(x1) x1⇤np.sin(x1)),0],[ y2 ,! ⇤np.sin(x1),0]]

partials[ ’ f ’ , ’y’] = [[x1⇤np.cos(x1) ,0],[0, np.cos(x1)]]

(46)

Benchmarks and Results

We first give a general description of computer architecture in Section 5.1. Then we start from a simple beam optimization problem in Section 5.2 and afterwards extend this model to the aerostruct shape optimization problem as a real case. Frist, we measure the performance of the sequential model and the paralled model¹. Then, we calculate the force based on the pressure and derive the displacement both for the beam and the plane frame models which can be used to optimize the shape of the airfoil in Section 5.3 and 5.4. In Section 5.5, we optimize the lift coefficient of the airfoil in order to show the cases where FD might be the only suitable way to compute the derivatives and its drawback.

Although there are a lot of criteria for evaluating the performance of an airfoil, the result of each model only results in the optimal value of what it aims at, i.e. the minimum of the objective function.

5.1 Computer Architecture Background and Parallelization Technique

Obtaining fast and efficient algorithms is not only a theoretical and mathematical process. Depending on the resources available on the computers that are used, it might be necessary to alter an algorithm slightly to gain better performance. To understand such steps, it is vi- tal to know about the features of the hardware being used. In this section a general description of computer architecture, which is needed to understand the later optimizations of the described algorithm, will

1Note that only linear solver is parallelized.

32

(47)

be given. As was mentioned previously, the future application of pro- cessing problems with large numbers of design variables is the goal of this thesis.

The algorithm was tested on 2.3GHz dual-core 7th-generation In- tel Core i5 processors and 8GB 2133MHz LPDDR3 memory chips using Python 3 and the cluster Tegner which is run by the PDC Cen- ter for high performance computing at KTH. Tegner is an Intel CPU based cluster . The cluster consists of two types of nodes, 55 Intel E5- 2690v3 Haswell nodes with 2 x 12 cores each and 10 Intel E7-8857v2 Ivy Bridge nodes with 4 x 12 cores each. All these nodes are equipped with NVidia GPU and have at least 512G RAM. The interconnection between the nodes is via an Infiniband EDR and a 5 PB Lustre file system is mounted on the cluster [36]. The topology of a Haswell node on the Tegner system is shown in Figure 5.1.