MASTER’S THESIS
NILS-ERIK SVANGÅRD
Implementation of Advanced Time Stepping Algorithm Computational
Fluid Dynamics Software
MASTER OF SCIENCE PROGRAMME Mathematics
Luleå University of Technology
Department of Mathematics
Computational Fluid Dynamics Software
M ASTER T HESIS R EPORT
Nils-Erik Svang˚ard nilserik@gmail.com Under the supervision of Ph.D. Niklas Andersson
and Ph.D. Bo Kjellmert
November 21, 2006
This master thesis contains an investigation of the Portable Extensible Toolkit for Scien- tific calculations (PETSc) and a time stepping implicit solver for solving computational fluid dynamics problems. The work discusses the two different approaches to achieve a time stepping solver with PETSc and what other possibilities PETSc hold for Com- putional Fluid Dynamics (CFD) codes. An implementation using PETSc’s Ordinary Dif- ferential Equation (ODE) solver is presented as well as a solver using PETSc non-linear solver (SNES). The code is implemented in the Volvo Aero in-house CFD software G3D.
The way PETSc was implemented in the present work did not result in a CFD code
with satisfying performance. This thesis discusses the PETSc framework in general, the
present PETSc implementation, reasons for the unsatisfying results and suggestions for
future work.
Contents
1 Introduction 3
2 G3D 3
2.1 Basic overview . . . . 3
2.2 Governing equations . . . . 3
3 PETSc 6 3.1 Basic overview . . . . 6
3.2 Compiling The PETSc Library . . . . 7
3.2.1 General compilation . . . . 7
3.2.2 Volvo Aero compilation . . . . 7
3.3 Using Matlab with PETSc . . . . 7
3.4 Basic Data types in PETSc . . . . 8
3.5 The non-linear solver: SNES . . . . 8
3.6 The time stepper: TS . . . . 9
3.7 Basic PETSc program . . . 10
4 Adding PETSc solver to G3D 11 4.1 Basic overview . . . 11
4.2 Handling mesh data . . . 11
5 SNES Backward-Euler Implementation 13 5.1 Backward-Euler . . . 13
5.2 The code . . . 13
5.3 The FormFunction() . . . 15
6 TS ODE solver Implementation 17 6.1 Backward-Euler . . . 17
6.2 The code . . . 17
6.2.1 The function . . . 19
7 Building and Running the Implicit Solver 19 7.1 Basic overview . . . 19
7.2 Program specifics . . . 20
7.3 Executing . . . 20
8 Conclusions and Discussion 20 8.1 Problems encountered . . . 20
8.2 Future work . . . 21
9 Acronyms 22
BIBLIOGRAPHY 23
A TS and SNES implementations in FORTRAN 24
A.1 snes.F . . . 24 A.2 ts.f . . . 29
B The modified Bratu-example 35
B.1 ex5f.F . . . 35 B.2 ex5f.h . . . 48
C The makefile 50
1 Introduction
The thermo and aerodynamics department at Volvo Aero has during the past years performed substantial refinement of its in-house CFD codes, VolSol, in order to meet the demands on more advanced methods in the field of space and aircraft propulsion.
With the introduction of state of the art methods such as unique algorithms for non ideal-gas formulations and conjugated heat transfer simulations there is now a need for a new implicit formulation as well as implementation of advanced parallelization in the code. This work should be seen as an initial investigation to increase to convergence rate to obtain a steady flow solution with the in-house CFD code by a factor of at least 10. All work has been done on a code similar to the actual production code, through the use of the PETSc. Furthermore an investigation into PETSc built in parallelization and dynamic multi-grid methods was conducted. The ABSOFT FORTRAN 8.0 compiler is used everywhere if nothing else is written.
2 G3D
2.1 Basic overview
G3D is a CFD software developed at Chalmers University of Technology and Volvo Aero by Eriksson (1995). The software works with general structured boundary-fitted curve-linear, non-orthogonal multi-block mesh. It solves Navier-Stokes equation using an explicit 3-stage Runge-Kutta algorithm. By introducing an implicit solver to the code larger time steps can be taken even in low speed approaches and hopefully speed up the calculation. G3D uses the finite-volume method. The version of G3D used in this thesis was not parallelized, however there exist versions where each of the calculations are blockwise parallel, meaning that the mesh blocks are distributed over the available nodes and calculations are performed on the local grids.
2.2 Governing equations
In this section the equations governing the fluid dynamics of the flows used in G3D are presented in tensor form. This should be seen as background information and if more information is needed please read Andersson(2005) [1].
The Reynolds-averaged Navier-Stokes (RANS) equations constitute a set of trans- port equations governing transport of mass, momentum and energy:
∂Q
∂t + ∂F j
∂x j
= 0 (1)
where
Q =
ρ ρ ˜ u i
ρ ˜ e 0
(2)
and
F j =
ρ ˜ u i
ρ ˜ u i u ˜ j + pδ ij − τ ij
ρu j h 0 + q j − u i τ ij
(3)
where τ ij is the viscous stress obtained by using Boussinesque assumption and is de- fined by
τ ij = (µ + µ t ) ∂u i
∂x j
+ ∂u j
∂x i
− 2∂u k
3∂x k
δ ij
− 2
3 ρkδ ij (4)
The eddy viscosity is defined by
µ t = C µ ρ k 2
ε (5)
Fourier’s heat law gives the heat flux expressed as
q j = −λ ∂ ˜ T
∂x j
= −C p (( µ
P r + µ t
P r t
) ∂ ˜ T
∂x j
) (6)
Turbulent kinetic energy and dissipation of turbulent kinetic energy are modeled as a standard k − ε-model
∂k
∂t + ∂
∂x j
ρu j k − (µ + µ t
σ k
) ∂k
∂x j
= P k − ρε (7)
∂ε
∂t + ∂
∂x j
ρu j ε − (µ + µ t
σ ε
) ∂ε
∂x j
= (C ε
1P k − C ε
1ρε) ε
k (8)
The expression for the production of turbulent kinetic energy in the standard k − ε is written as
P k =
µ t ( ∂u i
∂x j
+ ∂u j
∂x i
− 2∂u k
3∂x k
) − 2 3 ρkδ ij
∂u i
∂x j
(9) The gas mixture behaves as a perfect gas
P = ρRT (10)
Eqn. 1 is discretized on a structured non-orthogonal boundary-fitted mesh. Integrat- ing Eqn. 1 over an arbitrary volume Ω gives
Z
Ω
∂Q
∂t dV + Z
Ω
∂F j
∂x j
dV = 0 (11)
Using Gauss theorem to rewrite the second volume integral to a surface integral and introducing Q as the cell average of Q over Ω gives
∂Q
∂t V + Z
∂Ω
F j · dS j = 0 (12)
In Eqn: 12 S j = n j S is the face area normal vector, an area with a direction. The integral of the fluxes is approximated using the area of all faces and the face average flux
Z
∂Ω
F j · dS j =
allf aces
X
i=1
F j i · S j i (13)
Eqn: 11 can now be written as
∂Q
∂t V +
allf aces
X
i=1
F j i · S j i = 0 (14)
The total flux, F j , is divided into a convective part and a diffusive part
F j =
ρ˜ u j
ρ˜ u i u ˜ j + pδ ij
ρ˜ e 0 u ˜ j + p˜ u j
+
0
− σ ij − τ ij
− C p (( P r µ + P r µ
tt
) ∂x ∂ ˜ T
j
− u ˜ i (σ ij + τ ij )
(15)
where the first part is the convective and inviscid part, the second part is the diffusive and viscous part. In the code fluxes are evaluated for each cell face in the computational domain by moving a four-point stencil around the domain cells, i.e. they are evaluated in a ”method of lines”-manner. Interface-faces and boundary faces are treated separately using special versions of the flux routines. In the boundary face routines ghost cell val- ues needed for the evaluation of the four point stencil flux expression are extrapolated using MOC (method of characteristics) or similar.
As mentioned earlier G3D uses a three-stage second-order low-storage Runge-Kutta time marching method. When all convective and diffusive flux have been calculated (which is described in detail in Andersson [1]) the temporal derivative of the flow vari- ables for a specific cell is estimated as:
∂Q n
∂t = F n (16)
Q ∗ = Q n + ∆tF n Q ∗∗ = 1
2 [Q n + Q ∗ + ∆tF ∗ ] Q n+1 = 1
2 [Q n + Q ∗ + ∆tF ∗∗ ] (17)
where superscript n points to the present time step and n + 1 is the next time step.
Superscripts ∗ and ∗∗ describe the sub-time steps. ∆t is the solver time step, the time
between n and n + 1.
3 PETSc
3.1 Basic overview
PETSc is short for Portable Extensible Toolkit for Scientific calculations and is used for ad- vanced numerical computations. PETSc is a set of libraries and data structures for solv- ing linear and nonlinear systems of equations in parallel. It has built in parallelizing using Message Passing Interface (MPI). PETSc is developed by scientists at the Mathe- matics and Computer Science Division at Argonne National Laboratory and is mainly funded by the U.S. Department of Energy.
Each library works with a specific area and is used by higher level libraries, for ex- ample the time stepping library uses the vector library and the non-linear solver library to solve time stepping problems. The libraries support a multitude of types and algo- rithms like: vectors, Krylov subspace methods, matrices, nonlinear solvers, time step- pers, multigrid methods, distributed data types, automatic differentiation and more.
PETSc is by design well suited to plug into existing code, for example the vector implementation has been made so that it is easy to convert the data structure used by old code to PETSc framework. All types in PETSc have been designed with parallel operations in mind. This means that it is relatively easy to make software that performs calculations in parallel with little prior knowledge on parallel programming. The code is written to promote code reuse and flexibility.
The PETSc libraries are modern and object orientated. Functions to handle errors, handle and displaying data are built in. PETSc is generally tested with different com- pilers by the PETSc development team. This is done with the current version of the compiler tested. The ABSOFT brand of FORTRAN compilers are among the ones tested.
The version of PETSc used in this work is 2.3.0. When doing a new project it is always recommended to use the latest version. The PETSc website [2] is the main resource of documentation and support. The PETSc team provides limited support on the current version and is done on public mailing lists. These lists together with the documentation are very good resources of experience and information and the people answering are very competent.
Function Operation
Create() Create the object.
Get/SetName() Name the object.
Get/SetType() Set the implementation type.
Get/SetOptionsPrefix() Set the prefix for all options.
SetFromOptions() Use command line options.
SetUp() Perform other initialization.
View() View the object.
Destroy() Cleanup object allocation.
Table 1: Almost every object in PETSc supports a basic interface shown above.
3.2 Compiling The PETSc Library
3.2.1 General compilation
The installation procedure depends on which architecture the library is intended to be used on, detailed information on the installation procedure is available at the PETSc website. The information given here should be seen as a guide for quick installation.
The primary source of information should always be the website. The instructions take for granted that the PETSc source code has been downloaded and unpacked in an ap- propriate directory where the person intending to install has the correct permissions.
And of course a supported environment with all the requirements stated for the PETSc library met.
3.2.2 Volvo Aero compilation
The procedure for compiling PETSc on Volvo Aero differs from the installation proce- dure described in the PETSc installation manual. This is due to the limitations of the ABSOFT FORTRAN 8.0 compiler which is the compiler available on Volvo Aero com- puters. This compiler is old and have many mismatches in flags which are not compat- ible with newer versions. This prompts the user to manually control the behaviour of PETSc installations scripts by editing configuration files and appending the appropriate compiler flags.
Before trying to compile any PETSc application in the Volvo Aero environment the file petscconf has to be edited. It can be found in the directory $P ET SC DIR/bmake/linux−
gnu/. All occurrences of −W, −rpath has to be changed to −L otherwise the compiler will complain. For explanation of the compiling flags the manual for ABSOFT FOR- TRAN 8.0 [6] is recommended.
PETSc compilations scripts provide the functionality to download and unpack di- rectly via the command line. But since the Volvo Aero computers access Internet via a proxy there is no good way for reaching the Internet to download packages from the command line. If Basic Linear Algebra Subprograms (BLAS) and MPI are wanted they have to be downloaded separately, unpacked and put in $P ET SC DIR/externalpackages/, then run conf igure.py with the ”−download − ∗” options and PETSc will use the pack- ages downloaded.
3.3 Using Matlab with PETSc
There are many ways to view and manipulate data in PETSc. An interesting usage is with Matlab, which isn’t used in this work but could potentially be useful when per- forming calculations. There are a couple of ways to use Matlab with PETSc, you can for example:
• Dump data for use in Matlab
• Use Matlab interactively from PETSc
• Automatically send data back and forth between PETSc and Matlab
A detailed explanation how to do these operations is discussed in the PETSc manual [3].
It is however trivial to output all vectors or matrices from the command-line to a file in Matlab format. For example −vec view matlab will print to standard output (generally the monitor) every time a vector is assembled.
3.4 Basic Data types in PETSc
PETSc has data types that replace int, real and others, they have names like PetscInt, PetscScalar. The PETSc Users manual give instructions on how to use them. Note how- ever that by default PETSc uses double precision and FORTRAN by default uses single precision (at least Volvo Aero ABSOFT FORTRAN installation does). This means that for instance a PetscScalar is the same as real*8, if this needs to be remedied a recom- pile of the PETSc library is needed with the proper flags set. One of the simplest PETSc objects is Vec which is a standard vector, it can be defined as a standard sequential or a parallel vector (using MPI) but still has the same interface. The vector object and all other PETSc objects have predefined functions that do the basic manipulations needed.
There are also objects to handle the management of structured and unstructured grids.
The object PETSc Distributed Array (DA) handles distributed arrays and is for struc- tured grids. IS and VecScatter is for handling vectors for unstructured grids. There is also support for arbitrary degree of freedom in vectors and cells.
3.5 The non-linear solver: SNES
The PETSc object contains methods for solving systems of nonlinear equations of the form:
F (x) = 0 (18)
SNES uses Newton-like methods and includes both line search and trust region tech- niques. The interface to use the object is the same as for almost all PETSc objects (see Table 1). SNES works by computing
p = −approxinv(J) ∗ F (uold) (19)
and then does a line search on
unew = uold + lambda ∗ p (20)
to get the new u (J is the Jacobian). First it uses a test value of 1 for lambda so it tries to compute F (uold + p). It is possible that uold + p has some ”non-physical” values in it.
There are two ways to handle these non-physical values:
• Before doing the line search, SNES calls a ”pre check” function, that can change the step if it decides that the step must be decreased. A custom pre check function can be defined with SNESLineSearchSetPreCheck().
• In the function that calculates the new values, add a check that exits the function when a non-physical value is detected. Such a solution should make sure that no calculations are done with the non-physical values.
The SNES object also supports matrix-free methods which do not need a Jacobian.
Matrix-free methods are used in this work because the Jacobian was hard to construct.
The matrix-free methods in PETSc are generally slower than thoose using a Jacobian.
Figure 3 is a basic SNES-application also showing the parts of the program that the programmer must implement. The SNES-object is implemented on top of smaller com- ponents this is shown in Figure 1.
Non Linear Solver (SNES)
Profiling Interface Communications Kernels
Preconditioner (PC) Linear Solver (SLES)
Figure 1: The SNES-object is built of smaller components.
3.6 The time stepper: TS
The PETSc Time Stepper (TS) object is an ODE integrator.
It solves ODE’s of the form:
du
dt = F (u) (21)
where the user provide F and its Jacobian. The time stepper uses the PETSc nonlinear solver to solve the equations and provides an easy way to change time stepping algo- rithm. See Table 2 for a list of currently supported time stepping algorithms.
The TS-object is also built on top of smaller modules as is shown in Figure 2.
Non Linear Solver (SNES) Time Stepper (TS)
Preconditioner (PC) Linear Solver (SLES)
Profiling Interface Communications Kernels
Figure 2: The TS-object is built on top of smaller modules.
Define Type Description
TS EULER euler The explicit Forward-Euler method TS BEULER beuler The implicit Backward-Euler method TS PSEUDO pseudo Pseudo-time stepping
TS CRANK NICHOLSON crank-nicholson The implicit Crank-Nicholson method TS SUNDIALS sundials The LLNL CVODE/SUNDIALS package TS RUNGE KUTTA runge-kutta The explicit Runge-Kutta methods
Table 2: ODE solvers to use with PETSc’s TS object.
3.7 Basic PETSc program
To fully utilize PETSc capabilities much time must be spent reading the documentation and compiling the tutorials and examples. PETSc has a high treshold, mainly because it is designed to be scalable and mathematically correct.
The application code that utilizes PETSc libraries must be contained within an ini- tialize routine and a finalize routine that creates and destroys the framework, in effect code relating to PETSc must be contained within:
c a l l P e t s c I n i t i a l i z e (PETSC NULL CHARACTER, i e r r ) c a l l PetscUserCustomRoutines ( i e r r )
c a l l P e t s c F i n a l i z e ( i e r r ) Option Description
-ts max steps Maximum number of iterations to use -ts max time Set maximum time
-ts type Set which ODE-solver to use, see Table 2
Table 3: Command-line options for PETSc’s TS object
There are routines to access the command line from within a PETSc enabled pro- gram. For instance to read the command-line option −dt and transfer it to the dt variable write:
c a l l Pe tscOp tionsGe tR e a l (PETSC NULL CHARACTER, ’ − dt ’ , dt , f l g , i e r r )
4 Adding PETSc solver to G3D
4.1 Basic overview
When implementing an implicit solver we have to realise that the best implicit solver step will take much more computation time than an explicit step, at least a factor of 10 (closer to a factor of 100). Thus to get a faster solver than the explicit one the time step must be increased by at least a factor of 100 or more, anything less is not worth the effort making an implicit code. If the time step cannot be increased by this amount it might be better concentrating on making the explicit code as fast as possible.
To make sure that no new calculation errors are introduced, few changes in the orig- inal G3D code are desirable. This is best done by keeping all the original calculation code and replacing the code that perform the step. Since many prospective users of the toolkit will have a lot of old code, the toolkit has been designed to be easily integrated.
For instance the PETSc Vec object is designed to be an easily used replacement for an old vector representation, and automatically adds parallel properties to the vector.
During the course of this work two possible ways of solving time dependant differ- ential equations were explored. The first solution employs the functionality of PETSc nonlinear solver library SNES with a custom made time stepper attached. This solution provides direct manipulation of the step size and a way to handle errors.
The second solution uses the functionality of PETSc time stepping library TS, which in turn uses the nonlinear solver library. One of the benefits of using the TS library is the ability to change the time stepping algorithm easily from the command line.
4.2 Handling mesh data
Since the G3D uses a block structured mesh and PETSc does not support that type, we must handle the data. There are two main alternatives to do this:
• Use old code to manage the parallel blocks and communication for them, but use a big SNES around it to solve the implicit systems.
• Use multiple PETSc DA’s to manage the collection of blocks and use a big SNES around it.
The choice of implementation depends on how difficult/flexible the G3D code is. In
theory, not that much communication code has to be written for either approach, it is
PETSc