• No results found

Implementation and evaluation of a parametric model for thin layered materials

N/A
N/A
Protected

Academic year: 2022

Share "Implementation and evaluation of a parametric model for thin layered materials"

Copied!
62
0
0

Loading.... (view fulltext now)

Full text

(1)

2008:069

M A S T E R ' S T H E S I S

Implementation and evaluation of a parametric model for thin

layered materials

Henrik Elmunger

Luleå University of Technology Master Thesis, Continuation Courses

Electrical engineering

Department of Computer Science and Electrical Engineering

Division of EISLAB

(2)

Implementation and evaluation of a parametric model for thin layered materials

Henrik Elmunger

September 1, 2008

(3)

“Rough diamonds may sometimes be mistaken for worthless pebbles.”

Thomas Browne, Sr

(4)

Abstract

This master thesis evaluates a parametric model for ultrasound reflection in a layered medium. This model will be used for determining the thickness of each layer within the total thickness of the medium. With slight alteration one can also use it to find the total number of layers.

Ultrasound is a cheap and fast method for nondestructive testing of material properties.

Using ultrasound is a non toxic, clean (no saw or grind dust) and safe way to test as well.

Within this thesis the model is tested with both simulations and in situ tests at an industrial laboratory.

All Theory for the model is developed by Fredrik Hägglund, PhD student at LTU.

(5)

Preface

My master thesis is conceived within the frames of the M.Sc in electrical engineering pro- gram at Luleå Tekniska universitet (LTU). The thesis is commissioned by the Department of Computer Science and Electrical engineering. The thesis work, including measurements taken in a laboratory environment, have been done in Luleå and lasted between September 2006 and January 2007.

I embarked on writing this thesis knowing there would be optimization problems, and with my interest in numerical methods I found a very nice project to make my skills evolve.

I had somewhat knowledge about the physics of ultrasound, but this field have broadened a lot as well. I now take the opportunity to thank some people. First out, my examiner Johan Carlsson and PhD student Fredrik Hägglund for their time and answering any questions.

I also want to thank Inge Söderkvist for all the help regarding the numerical aspects.

(6)

CONTENTS CONTENTS

Contents

1 Background and outline 6

1.1 Layered medium . . . . 6

1.2 Ultrasound transversing through a layered medium . . . . 6

1.3 The parametric model . . . . 7

1.4 Optimization, finding an optimum . . . . 7

2 About wave propagation 9 2.1 Flaw detection . . . . 9

2.2 Acoustic impedance, Z . . . . 9

2.3 The posed Problem . . . . 10

2.4 A Parametric model as the posed Solution . . . . 12

3 Numerical methods 15 3.1 Quadratic nonlinear least squares problem and the Gauss-Newton method . 15 3.2 Maximum Likelihood Estimation (MLE) . . . . 16

3.3 Condition number, SVD and eigenvalues . . . . 17

3.4 Enhancing the Gauss-Newton . . . . 18

3.5 Line search . . . . 19

3.6 Newtons Method . . . . 23

3.7 Quasi-Newton . . . . 23

3.8 Perturbation methods . . . . 26

3.9 Levenberg-Marquardt . . . . 26

3.10 Discussion about robustness vs efficiency . . . . 27

4 Simulations 28 4.1 1 layer . . . . 30

4.2 2 layers . . . . 31

4.3 3 layers . . . . 33

5 Experiments using collected data 37 5.1 1 layer . . . . 37

5.2 3 layers . . . . 39

5.3 Analysis . . . . 47

6 Discussion and Conclusion 55 6.1 Part I, Pure Gauss-Newton . . . . 55

6.2 Part II, Enhancements . . . . 55

6.3 General conclusion . . . . 55

(7)

1 BACKGROUND AND OUTLINE

1 Background and outline

Testing is the water divider that put some theories to rest, and others into good use. In this thesis a parametric model for describing an ultrasonic pulse traveling through a layered medium is tested, both via simulations and actual testing in a laboratory environment.

The model that is implemented in this thesis simply states that “Each new layer in a medium only need two parameters, that is, a time constant that tells you the thickness of the layer, and a reflection constant that tells you how much of the energy of the wave that is reflected back to the ultrasound receiver”. Whatever this means will be explained further on.

Now, the questions mounting up are:

• Is this model is accurate enough for use in the industry, or does it need more param- eters?

• What numerical methods are required?

• Will these methods crave too much when it comes to the number of iterations used or the time needed to do the iterations?

In this introduction I will briefly explain the methods and concepts, in later sections there will be a more detailed view on each of the subtopics in this first introductory section. The final chapter contains a summary and conclusion. By then I hope that whoever reads this thesis report will agree with me that the answers have been found!

1.1 Layered medium

The definition of a layered medium is simply that different regions in the medium have different physical attributes, these regions have a sharp boundary between them. As simple examples one can mention water on ice or a metal ruler glued to a wooden desk. As an opposite one can imagine what happens when a thick steel rod is hardened, it will be brittle in the surface but soft in the center, the physical attributes are not the same for the surface and the center, but there are no sharp boundary for this change.

1.2 Ultrasound transversing through a layered medium

When the sound wave travels through a specific medium its velocity is constant. When

entering a new layer, with different density and elastic properties, the velocity changes. In

other words, a sharp boundary between two layers causes a discontinuity in the velocity

function of the wave. Further on, the product of the velocity of sound and the density of

the medium in which the wave travels is called the characteristic acoustic impedance, Z,

this value is important when determining the reflection R

c

of sound at the interface of two

layers. R

c

plays an important role in the parametric model.

(8)

1.3 The parametric model 1 BACKGROUND AND OUTLINE

1.3 The parametric model

The model consists of a usual concept of an input signal u, transfer function h and output signal y

est

.

Figure 1: Transfer function block

The parameters θ lie within the transfer function h. The input and output signals are connected via the convolution integral:

y

est

(t) =

Z

−∞

h(θ, τ )u(t − τ )dτ

Comparing y

est

with a measured signal y

m

, it is possible to change the parameters in the θ vector to minimize the error

y

est

− y

m

This branch of numerical algorithms is called optimization 1.4 Optimization, finding an optimum

For readers unfamiliar with the concept one of the most fair ways of describing the process

of optimization goes like this: Imagine a bowl, the bowl doesn’t have a flat bottom, but its

cross section describes half a sphere. The optimum? Put a pebble in the bowl, it will roll

around for a while but stop at the bottom. In other words, the bottom of the bowl is the

optimum, the process of optimization is the pebble. To make it a tad more realistic, use a

hammer and give the bottom of the bowl a beating, from both the inside and the outside,

it will hold, it is not an imaginary glass bowl! Now, if one drops the pebble back into the

bowl, it will rattle around a bit, and sooner or later find a cavity. The thing is, do we know

that this cavity is the deepest? No, it might be, it might be a more shallow cavity, or even

a flat area between two cavities. The pebble has found a local minimum if it rests in a

cavity, and we can guarantee that it does just that, that is to guarantee global convergence,

and as already mentioned, we cannot guarantee that it finds the deepest cavity, that is, we

cannot guarantee global optimization. To make the pebble find the minimum of the cavity

we are interested in, we have to release it in a good spot, we also need to make sure that

(9)

1.4 Optimization, finding an optimum 1 BACKGROUND AND OUTLINE

we don’t give it too much momentum, as it will slide over the edge into another region.

These last two examples are analogies for initial point and line search.

(10)

2 ABOUT WAVE PROPAGATION

2 About wave propagation

Literature on this subject is abundant. The emphasis in this thesis is on the Numerical aspects and not on the underlying physics. Therefore this section is kept short.

An ultrasonic pulse traveling through a homogeneous material does so with a constant velocity. A factor of attenuation (dependent on the frequency of the wave) determines how far the pulse can travel. If the material is in any way layered, ie, contains a sharp boundary between different materials with different properties, the pulse will divide its energy into being a) reflected and b) transmitted into the new material. Thus the boundary being a discontinuity in the equation that governs the wave propagation. In this second material, the pulse will travel with a different speed, and having a different attenuation constant (due to a frequency change). In other words;

W avelength = V elocity f requency or

λ = c/f

Here c is dependent on the density and spring constant of the material.

2.1 Flaw detection

The wavelength, λ, is vital when it comes to the ability to detect flaws in a medium, ie, be reflected on a boundary. The rule of thumb is that the region causing the discontinuity needs to be larger than at least half the wavelength in order to be detectable. The key words here are sensitivity and resolution. Sensitivity being the ability to detect small discontinuities and resolution is the ability to locate two or more discontinuities that lay close to each other. As seen in the equation of wavelength, increasing the frequency term is directly proportional of decreasing the wavelength, and, increase the resolution and sensitivity.

On the other hand, increasing the frequency has its disadvantages. Some materials, such as steel or cast iron, have a coarse grain structure. A wave of too high frequency will reflect on the grain boundaries and scatter. Another source for the pulse to scatter is if a crack is orientated in an angle too far off being perpendicular to the pulse source.

2.2 Acoustic impedance, Z

Sound travels through materials under the influence of sound pressure. Because molecules or atoms of a solid are bound elastically to one another, the excess pressure results in a wave propagating through the solid.

The acoustic impedance Z of a material is defined as the product of its density ρ and

acoustic velocity c . This property is important when calculating the reflection R

c

and

transmission T

c

at the boundary. R

c

is given through the formula:

(11)

2.3 The posed Problem 2 ABOUT WAVE PROPAGATION

R

c

= ( Z

2

− Z

1

Z

2

+ Z

1

)

2

that says that the amount of reflection is dependent on the difference of Z on each side of the boundary. Since Z is easy to calculate, or even find in a table, for different materials, it is equally easy to compute R

c

. A parametric model that uses R

ci

(for boundary i) as parameters is good, this way computing an initial guess for the optimization loop is as easy as computing Z.

The other kind of parameters used in the model is the time of flight τ , knowing the thickness, d, of a layer, and its acoustic velocity, c, τ is also easy to compute as

τ = 2 ∗ d c 2.3 The posed Problem

What if the layer to be detected is

1. Thinner than half the wave length of the ultrasonic pulse used to detect it?

2. A transducer capable of higher frequencies (shorter wave length) is not available, or just very expensive?

3. A transducer capable of higher frequencies (shorter wave length) is available, but not usable?

Increasing the frequency of the transducer is not always a feasible route even if transducers are readily accessible, depending on the materials involved, most of the pulse might bounce off the topmost layer if the frequency is too high. This in turn would lead to a non detectable echo from layers beneath. Figure 2 and 3

1

below will be used for reference in this discussion.

1Images taken from [9]

(12)

2.3 The posed Problem 2 ABOUT WAVE PROPAGATION

Figure 2: First layer of a material

Figure 2 illustrates the train of thoughts, u

1

being the input signal, A

0

being the response from the first layer boundary, B

1

where the pulse bounces on the second layer boundary, A

2

being the response of B

1

. If the frequency is too high, most of the signal energy will be found in A

0

due to the reflection. In turn this leads to a large knowledge about the 0, 1 boundary, but A

2

will be lost in noise.

The next figure (3) shows an example of a received signal, with voltage levels on the Y-axis, and number of samples on the X-axis. Between arrows 1 and 2 (a pulse indicates a boundary) is a very thin layer, too thin to be picked up by the ultrasonic receiver (ie, shown as a pulse). This layer might be some adhesive between two sheets of metal.

Figure 3: Example of a received signal (3 layers)

The fact that there is an echo at arrow no. 2 indicates that the sample is solid right

through, an air gap will cause the signal to disperse. On the other hand, if there are limits

on how thick or thin the adhesive layer may be, at this moment there is no way of telling

(13)

2.4 A Parametric model as the posed Solution 2 ABOUT WAVE PROPAGATION

if the joint is ok or not. However, a discontinuity in this solid affects the ultrasonic pulse, hopefully enough.

2.4 A Parametric model as the posed Solution

The posed solution as of [9] is a parametric model. Using the concepts within this chapter along with fig. 2 it is easy to explain the model for a single layered (homogeneous) material.

As stated in chapter 1.3, the model consists of input signal, transfer function and output signal, related to each other via the convolution integral. Namely:

y(t) =

Z

−∞

h(θ, τ )u(t − τ )dτ + err

Here θ denotes the parameter list. In other words, the input and output signal is kept

“as is”, while the transfer function h[θ, τ ] is optimized to minimize the error term. Using frequency domain notation, this equation is written as

Y (ω) = H(θ, ω)U (ω) where ω = 2πf

2.4.1 Model for a homogeneous solid

Presenting this figure again, this time for the purpose of drawing the parameters directly from it. Layer 0 here represents the water in which the pulser/receiver is immersed, layer 1 represents the solid to me measured (between boundaries [0, 2]).

Figure 4: First layer of a material

(14)

2.4 A Parametric model as the posed Solution 2 ABOUT WAVE PROPAGATION

We need 4 parameters in θ:

• R

01

=Reflection constant at boundary [0, 1]

• R

12

=Reflection constant at boundary [1, 2]

• τ

0

= Time constant at boundary [0, 1], the time it takes for the pulse to travel the distance 2 ∗ d

0

• τ

1

= Time constant at boundary [1, 2], the time it takes for the pulse to travel the distance 2 ∗ d

1

The output signal from the top layer U

00

is the sum of all the echoes A

1i

and A

01i

. These summed up echoes are just the input echo times the transmission/reflection constant times the time of flight constant. In the case of the homogeneous solid this comes down to:

H = R

01

∗ e

−iωτ0

+ (1 − R

201

) ∗ e

−iω(τ01)

∗ R

12

1 + R

01

∗ R

12

∗ e

−iωτ1

Note: T

q,q+1

= 1 − R

q,q−1

2.4.2 multiple layers

For multiple layers we also need the contributions from layers below 2, as in fig. 5

Figure 5: Subsequent layers

In [9], the output signal at layer q can be calculated using the following building block equations:

• A

q

=

(1−Rq−1,q1−R)Rq,q+1(1−Rq,q−1)e−iωτq

q,q−1Rq,q+1e−iωτq

• B

q

=

(1−Rq,q−1)e

−iω 1 2τq

1−Rq,q−1Rq,q+1e−iωτq

(15)

2.4 A Parametric model as the posed Solution 2 ABOUT WAVE PROPAGATION

• C

q

=

(1−Rq−1,q)e

−iω 12τq

1−Rq,q−1Rq,q+1e−iωτq

• D

q

=

Rq,q−1e−iωτq

1−Rq,q−1Rq,q+1e−iωτq

These equations are the building blocks of the transfer function H, namely H = A + BC

D

The principle is the same as in the case of a homogeneous material, summing the echoes from below.

2.4.3 Concluding remarks

Even though a layer might be too thin to show up as an echo on a plot of a output signal,

the inhomogeneity will steal energy from the ultrasonic pulse traveling through it, some

part of said pulse will be reflected, and some part transmitted into the next layer. Having a

good idea on location and thickness for this inhomogeneity will serve as a starting guess for

an iterative optimization loop. In other words, the physical reality have been parametrized.

(16)

3 NUMERICAL METHODS

3 Numerical methods

The model is implemented whit in the Fourier domain, or frequency domain. Not that the frequency change in the ultrasonic wave is of interest, but due to the fact that convolution in the time domain becomes multiplication within the frequency domain. That is F {h∗∗g} = H ∗ G, where ∗∗ denotes convolution.

3.1 Quadratic nonlinear least squares problem and the Gauss-Newton method

The Gauss-Newton method is a modification to Newtons method. Newtons method uses a truncated Taylor expansion using two derivates.

The parametric model for the layered medium have been briefly described in previous sections. In this section we will see how the model can be turned into a least squares problem. First we need the theory for quadratic nonlinear least squares. The mathematics is rather simple in theory and the method goes like this:

minimize f (x) = 1 2

m

X

i=1

f

i

(x)

2

≡ 1

2 F (x)

T

F (x) The chain rule gives the components of 5f (x) :

5f (x) = 5F (x)F (x)

The Hessian B

k

= 5

2

f (x) is derived by differentiating with respect to x

j

: 5

2

f (x) = 5F (x) 5 F (x)

T

+

m

X

i=1

f

i

(x)∇

2

f

i

(x)

The Gauss newton method simply lets the Hessian be approximated by discarding the right hand sum in the above formula, ie

2

f (x) ≈ ∇F (x)∇F (x)

T

Note that f (x) is the total sampled signal, and that x is the parameter vector. Updating the parameter vector is done by the following definition of the Normal equations:

x

k+1

= x

k

− [∇

2

f (x

k

)]

−1

∇f (x

k

)

x

k+1

= x

k

− [5F (x

k

) 5 F (x

k

)

T

]

−1

5 F (x

k

)F (x

k

)

Now x

k+1

can be solved using regular least squares methods. In this particular case,

QR or SVD is recommended.

(17)

3.2 Maximum Likelihood Estimation (MLE) 3 NUMERICAL METHODS

The identity [5F (x

k

) 5 F (x

k

)

T

]

−1

5 F (x

k

)F (x

k

) is called the search direction, from here denoted by p.

Note: Solving the normal equations is equivalent of solving a linear least squares problem:

minimize for p :

k ∇F (x)

T

p k

22

= [∇F (x)

T

p + F (x)]

T

[∇F (x)

T

p + F (x)]

This can be done using a numerically stable form of QR-Factorization, preferably ob- tained via householder transformations. Due to a lack of implemented truncated QR meth- ods in the libraries the Scipy package use for implementation, this route is yet untested.

The advantage of solving for x

k+1

in this way is that a much lower condition number is obtained. If the condition number for the Normal equations equals κ

N e

, the QR approach gives κ

QR

= √

κ

N e

. See [2] page 416 and [5] pages 295-300 for details. More on the subject of the condition number is presented below.

3.2 Maximum Likelihood Estimation (MLE)

MLE is a popular statistical method used to calculate the best way of fitting a mathematical model to some data. Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model (in this case θ) to provide an optimum fit.

A short description might be, in MLE, setting σ

e

= 1 yields the method of Nonlinear least squares.

3.2.1 The model within the terms of optimization

The model is built on the principle of the maximum likelihood estimator. The model is built with the parameter vector θ within the transfer function.

• Y is the output signal

• H(θ) is the transfer function

• U is the input signal

In the Frequency domain this comes down to Y = H(θ)U . Both Y and U is corrupted by noise, so the model needs to be scaled proportional to the noise. This is done in the following manner using the variance of the measured noise σ

2e

(θ). The error of the signal is defined as

ε(θ) = Y − H(θ)U

σ

e

and defining the variance σ

e2

as

(18)

3.3 Condition number, SVD and eigenvalues 3 NUMERICAL METHODS

σ

e2

= σ

2y

+ σ

2u

|H(θ)|

2

− 2<{σ

yu2

H(θ)}

For the computation of σ

y2

, σ

u2

, σ

yu2

see [7], [10]

To return the the nonlinear least squares formulation, an making the following defini- tions:

• ε(θ) ≡ F (x)

• ε

2

(θ) ≡ f (x) = F (x)F (x)

T

• ε

0

(θ) ≡ ∇f (x) = ∇F (x)F (x)

• ε

00

(θ) ≡ ∇

2

f (x) ≈ ∇F (x)∇F (x)

T

θ

k+1

= θ

k

− [∇F ∇F

T

]

−1

∇F F 3.2.2 Good enough for simulation

When the input signal is well behaved, like an ideal mathematical model of the pulse, the model can take a lot of beating when it comes to making initial guesses or adding extra noise. Another test is also applied, using a real echo from the surface of the topmost layer as input signal. Even though this gives a higher condition number, the model still behaves in a stable manner.

Still, when moving from the well behaved simulations into the real world, one might encounter stability problems, and there is also the question of performance. Several ways of enhancing the Gauss-Newton algorithm are available, some of them are very easy to implement, other have a higher degree of complexity but may prove necessary.

3.3 Condition number, SVD and eigenvalues

The two types of error sources occurring in numerical analysis is a) errors in the measured data, the input to the algorithm, and b) the errors caused by the algorithm itself, due to approximations within it. If a small change in input data causes a large change in the output, for example if | f (x + δx) − f (x) |>> 0 even though δ is small, the problem is said to be ill-conditioned at x.

The condition number of a matrix A is defined by:

κ

=k A k

k A

−1

k

Due to the fact that this problem is ill-conditioned, ordinary least square methods is

prone to error, or may not even exist. There is need for a single value decomposition of

the problem, that is, in the inversion of the Hessian matrix. The condition number for the

Hessian, ie the matrix to invert, is high. For the distribution of the eigenvalues in Figure 1

below, the condition number = 54593. This means that noise/errors that is added to this

model will be heavily increased in magnitude.

(19)

3.4 Enhancing the Gauss-Newton 3 NUMERICAL METHODS

Figure 6: Eigenvalues on a semi-log scale, 8 parameters, Hessian size is 8x8

Keeping track of the eigenvalues under the process of optimization is a good thing. If one or more eigenvalues tends towards 0 while k increases (k being the number of iterations) the problem is ill posed. In this case a regularization method or a different model should be considered.

In a more performance based implementation, perhaps a QR factorization should be used instead, the SVD is a computationally heavy method.

3.3.1 Better approximation of the Hessian ?

It may also be of interest to do a better approximation of the Hessian matrix. There is a chance that the residual part discarded from the Newton method (the get Gauss-Newton) is large. This might have a) a large impact on the spread of eigenvalues/condition number and b) the curvature of the second derivates.

3.4 Enhancing the Gauss-Newton

The following enhancements are examined in this thesis:

• Line search methods

• Newtons method

• Quasi-Newton methods

• Combined Quasi-Newton/line search method

• Levenberg-Marquardt algorithm

(20)

3.5 Line search 3 NUMERICAL METHODS

A brief explanation and motivation of these methods are given below, and also the simu- lation results. The performance results will be based on:

1. Parameter sanity

2. Shape of the output signal (curve fitting) 3. Time of completion

• Even though the number of iterations, or total time of solving the problem is of great importance in a real world scenario, trading a tad more machine time to get sane results is more important. Also, with this master thesis being more of a “proof of concept” than an industrial application, the time of completion is of lesser impor- tance. The machine time used is either measured by the *NIX command “time”. (On my freeBSD machine: $ time ’script’ time ) or if running scripts in octave: tic ’script’

toc.

• The shape of the output signal is important, but slight differences between a measured and estimated graph should be tolerable.

• The parameter sanity is based on the physics, as an example: τ

1−3

cannot be negative and the reflection constants R

01−34

is bound between −1 < R

xx

< 1.

3.5 Line search

As stated above, there is something called global convergence, and we might even guarantee it, but there really is not anything like global optimization. A line search is a way to keep your optimization method of choice within your trusted region. These methods do need their safeguards though, a poorly conditioned Hessian and gradient can make make these algorithms flip and make our optimization method jump into another region, and the problem that led us into using line search bites its own tail. The two different line searches I have used is the back tracking algorithm and the strong Wolfe conditions, SWC. My main efforts have been with the SWC since this method is preferred when using quasi-newton methods. More on Wolfe conditions below.

For both methods a first trial step length of α

init

= 1 should be selected. This is due to the fact that often we only want our line search methods to be operative in the first iterations, the closer we get to our local optimum, the more likely we are to get (in theory and under nice well behaved conditions) an unthrottled step length. In other words, the closer we get to the target, the faster we may venture on (the line search algorithm starts out fresh with each new iteration of the optimization method).

Before going further it would be convenient to define new terms and notations:

• Letting Φ(α) denote f (x

k

+ α

k

∗ p

k

)

• Φ(0) denote f (x

k

)

• Φ

0

(α) denote 5f (x

k

+ α

k

∗ p

k

) ∗ p

Tk

(21)

3.5 Line search 3 NUMERICAL METHODS

• Φ is a function of one variable only, namely α

• α

k

is the step length

• p

k

= [5F (x

k

) 5 F (x

k

)

T

]

−1

5 F (x

k

)F (x

k

) is the search direction

• x

k

is the parameter vector 3.5.1 Back tracking

This inexact algorithm only makes use of the Sufficient decrease condition or Armijo con- dition given by the inequality f (x

k

+ α

k

∗ p

k

) > c

1

∗ α

k

∗ 5f (x

k

) ∗ p

k

. It is an inexact algorithm due to the fact that α

k

don’t need to be an absolute minimizer. This property makes it much faster than what an exact line search would be.

The back tracking algorithm framework is defined as follows:

Given c

1

∈ (0, 1/2)

set α

k

= 1 this is the initial trial step length

while f (x

k

+ α

k

∗ p

k

) > c

1

∗ α

k

∗ 5f (x

k

) ∗ p

k

do : α

k

∗ c

2

, end 0 < c

1

< c

2

< 1 x

k+1

= x

k

+ α

k

p

k

where again:

• α

k

is the step length

• p

k

is the search direction

• x

k

is the parameter vector

I have used brute force back tracking in the sense that I decrease α

k

with a small constant for each spin in the loop, It wastes a lot of clock cycles but does its job pretty quickly. One can the quadratic and/or cubic interpolation to get a new α

trial

at each iteration though, I have not tried that as I only need the back tracking algorithm to get something stable for comparison to the SWC. On a further note, the term back tracking means that one tries a large α and then keeps reducing it until the conditions are met.

3.5.2 Wolfe conditions: Quadratic and Cubic approximation

Quadratic approximation is used to find a minimizer α

min

on the interval x

k

+ α

trial

p

k

. We already know f (x

k

), f

0

(x

k

), and calculating f (x

k

+ p

k

) (with α

trial

= 1) gives a third piece of information. We use these three pieces to interpolate the approximation as follows:

Φ

q

(α) = (

Φ(α0)−Φ(0)−αα2 0∗Φ0(0) 0

) ∗ α

2

+ α ∗ Φ

0

(0) + Φ(0)

(22)

3.5 Line search 3 NUMERICAL METHODS

The new trial value is defined as the minimizer of this quadratic, differentiating we get:

α

1

= −

2[Φ(α Φ0(0)∗α20

0)−Φ(0)−Φ0(0)∗α0]

If α

i

, which is the new trial value, fails to accommodate the conditions of the line search, one might want to use this new value to do a cubic approximation:

Φ

c

= c

low

∗ α

3

+ c

high

∗ α

2

+ α ∗ Φ

0

(0) + Φ(0) with the formula for c

low

, c

high

as follows:

 c

low

c

high



= 1

α

20

α

21

1

− α

0

)

 α

20

−α

21

−α

30

α

31

  Φ(α

1

) − Φ(0) − Φ

0

(0)α

1

Φ(α

0

) − Φ(0) − Φ

0

(0)α

0



Note: c

low

, c

high

does not have anything to do with the maximum or minimum values set for α

low

, α

high

.

3.5.3 Wolfe conditions

The Wolfe condition is fairly easy mathematics, the implementation isn’t. Even the text- book suggested the reader to use a already written program instead of doing an implemen- tation.

The SWC algorithm works with two conditions, one is the sufficient decrease condition used in the back tracking algorithm, the second condition is about the derivates of the function. Thus it is called the curvature condition.

• Φ(α) ≤ Φ(0) + c

1

α

trial

Φ

0

(0)

T

∗ p sufficient decrease condition

• Φ

0

(α)

T

p ≥ c

2

Φ(0)p curvature condition weak Wolfe conditions

• |Φ

0

(α)

T

p| ≥ c

2

Φ(0)p curvature condition strong Wolfe conditions

The difference between the strong and weak WC is that the strong WC is an exact line search. Adjusting the value of c

1

(≈ 1E −4) and c

2

(≈ 0.9) will loosen up the SWC though, it will then acquire about the same amount of work as the WWC. By making c

2

smaller, one can control the quality of the search, forcing the accepted value of α to lie closer to a local minimum of the interpolated curve between Φ(0) and Φ(a).

The framework for the algorithm is as follows:

choose α

low

= 0, α

trial

> 0, α

max

> α

trial

endless while loop

if Φ(α) > Φ(0) + c

1

α

trial

Φ

0

(0)

let zoom algorithm compute a new α

trial

if |Φ

0

(α)| ≤ −c

2

Φ(0)

set the step length α

step

= α

trial

and break if Φ

0

(α) ≥ 0

let zoom chose a new interval and set a new α

trial

end

(23)

3.5 Line search 3 NUMERICAL METHODS

3.5.4 Bracketing an interval (zoom)

The bracketing algorithm in [5] is called “zoom”, I will adopt this name within this thesis.

The zoom algorithm also utilizes an infinite loop that is only broken if a good enough step length is found, ie there is no guarantee that this will happen. One can build some security into the algorithm that won’t let the next step length be less than one tenth of the previous (α

k+1

≡ at least 0.1 ∗ α

k

).

The framework of the zoom algorithm is much alike the one of the SWC:

interpolate using quadratic or cubic interpolation to find a new α within the interval of (α

low

, α

max

)

if Φ(α

trial

) > Φ(0) + c

1

α

trial

Φ

0

(0) set α

high

= α

trial

else if |Φ(α

trial

) ≤ c

2

Φ

0

(0)

deliver α

trial

to the SWC for testing and break if Φ

0

trial

)(α

high

− α

low

) ≥ 0

set α

high

= α

low

; α

low

= α

trial

end

3.5.5 Tie them together

The Quasi-Newton methods asks the Wolfe condition about the generic step length α = 1, if the SWC is not satisfied it calls on the zoom function, that in turn calls for the interpolation scheme until satisfied, zoom then returns a trial value of α to the Wolfe condition that tests this new trial value. If the SWC is satisfied, it returns the actual step length to the Quasi-Newton method.

3.5.6 Encountered problems and results Brute force back tracking

This method works, even though for some initial points of the parameter vector, the step length becomes very small (giving a small reduction in the cost function). Again, a too small step length might terminate the optimization loop prematurely, to use the image of a ravine, or perhaps a ditch leading off water, the sides being very steep but the bottom having a very slow descent. In order not to climb up the walls of the ditch the step length must be kept small, this small value, in turn, causes a small change in the norm of kθ

k+1

− θ

k

k, thus the premature termination.

As the name suggests, this method is quite slow and should be used only as a debugging tool.

SWC

This method is quite slow for some initial points of the parameter vector. Even though

more elegant, SWC uses a lot of function calls (for example, computing the derivatives

(24)

3.6 Newtons Method 3 NUMERICAL METHODS

in each new trial version of the step length). On the other hand, the curvature condition protects us from the dangers of premature termination.

3.6 Newtons Method

In Newtons method, (that is not really an enhancement of the GN method, rather the other way around), The Hessian is fully computed, ie, one goes through the trouble of computing the residuals, the P

m

i=1

f

i

(x)∇

2

f

i

(x) part of the second derivates. This is often avoided due to being both error prone (ie human errors in defining the second derivates) and being computationally heavy. The benefit being a much more rapid convergence, Newtons method gives quadratic convergence rates close to the optimum compared to the linear or super-linear convergence rates of Quasi-Newton methods. The motivation to implement and use the Newton method within this thesis is mainly for benchmarking the parameter sanity and curve fitting properties.

Note: I used Matlab to generate∇

2

f

i

(x) via symbolic differentiation. As Matlab probably does the symbolic computations right, but the file containing the formulas for second order derivates is huge, ≈ 54000 characters of code. It should also be noted that when not using the Matlab command “simplify” the above mentioned file is much larger, with measurable round off errors between the two. This leads to an argument if one might win that much of the extra accuracy with clumsy Matlab generated code. Some trial runs showed that the Newton method is indeed the slowest.

3.7 Quasi-Newton

A quasi-Newton method is almost any method where the Hessian is not explicitly com- puted, instead one does an approximation. This can be done in a lot of ways, from the naive method of just setting the Hessian B

0

equal to the identity matrix I to a more elab- orate method like the BFGS (Broyden-Fletcher-Goldfarb-Shanno) method. In the middle somewhere (performance wise) lies the above described Gauss-Newton method, a fairly simple method suited for well defined problems or as a workhorse in the development and simulation phase. In this thesis I have decided to include my experiences implementing the BFGS and Modified BFGS (MBFGS ) methods. One of the reasons for this is to see if the Hessian is really worth computing explicitly. Using Newtons method, the Hessian is computed at cost O(n

3

) vs O(n

2

) for an efficient quasi -Newton method.

3.7.1 BFGS and Modified BFGS

The BFGS is considered to be the most efficient method of choice for approximating the

Hessian B

k

. During my Pre-study and information collection for this thesis, and in deciding

whatever approximation method for B

k

to use, I ran across the Modified BFGS. These to

methods will be discussed below in due order.

(25)

3.7 Quasi-Newton 3 NUMERICAL METHODS

3.7.2 BFGS (Broyden-Fletcher-Goldfarb-Shanno)

• s

k

= θ

k+1

− θ

k

• y

k

= ∇f (x

k+1

) − ∇f (x

k

)

• B

0

= I

B

k+1

= B

k

− (B

k

s

k

)(B

k

s

k

)

T

(s

Tk

B

k

s

k

) + y

k

y

kT

y

Tk

s

k

As seen from the BFGS formula, this is an iterative method, giving a better approxi- mation of the Hessian for each loop. Thereby giving a better estimate of the curvature of the model than the GN, thus increasing the rate of convergence.

Outline of the algorithm

I have added a modification to this algorithm as well, instead of letting B

0

= I, I let B

0

= ∇F ∇F

T

, that is, the Gauss - Newton approximation. The outline of the algorithm is taken from [2]

• p

k

is the search direction

• α

k

is the step length 1. For k = 0, 1...

2. if x

k

optimal, stop

3. Solve B

k

p = −∇f (x

k

) for p

k

4. Determine x

k+1

= x

k

+ α

k

p

k

5. Compute s

k

, y

k

6. Compute B

k+1

to use for the next loop

3.7.3 MBFGS (Modified Broyden-Fletcher-Goldfarb-Shanno)

The reason for the modification is increased performance. Wei, [6], shows that this method owns the property of super-linear convergence under some reasonable conditions In this thesis I have not shown that these conditions hold, but it might be an interesting feature to implement in future work. Furthermore, Wei, [6], uses a fixed formula for the step length a

k

, a cheaper way than using back tracking.

• s

k

= θ

k+1

− θ

k

• g

k

= ∇f (x

k

)

(26)

3.7 Quasi-Newton 3 NUMERICAL METHODS

• y

k

= ∇f (x

k+1

) − ∇f (x

k

)

• y

km

= y

k

+ A

k

s

k

• α

k

=

kpδgTkpk

kk2Qk

, where kp

k

k

Qk

= q

p

Tk

Q

k

p

k

for some positive semi definite matrix Q

k

, and δ ∈ (0, ν

min

/L)

• L satisfies k g

k

− g

k+1

k≤ L k x

k

− x

k+1

k

• ν

min

satisfies ν

min

p

T

p ≤ p

T

Q

k

p ≤ ν

max

p

T

p

A

k

= 2[f (k

k

) + f (x

k+1

) + (g(x

k+1

) + g(x

k

))

T

s

k

] ks

k

k

2

B

k+1

= B

k

− (B

k

s

k

)(B

k

s

k

)

T

(s

Tk

B

k

s

k

) + y

km

y

Tkm

y

kmT

s

k

Outline of the algorithm

I have added my own modification to this algorithm as well, instead of letting B

0

= I, I let B

0

= ∇F ∇F

T

, that is, the Gauss - Newton approximation.

1. set a constant ρ ∈ (0, 1/2), set B

0

= ∇F ∇F

T

2. compute p

k

by solving p

k

= −B

k−1

∇F (x

k

)

T

F (x

k

)

3. if f (x

k

+ p

k

) ≤ f (x

k

) + ρg

kT

p

k

, let a

k

= 1, else, use the step length formula 4. set x

k+1

= x

k

+ a

k

p

k

5. update B

k

to get B

k+1

using the MBFGS formula 6. go to step 2

3.7.4 Encountered Problems and results

BFGS: The BFGS Hessian approximation does not help much in the first few iterations of the optimization loop, as it is an iterative method. This is bad in the context of the line search, especially if using the SWC. See the Encountered problems and results under the line search topic. On the other hand, using the GN Hessian approximation as B

0

, some improvement should be seen for the second loop.

MBFGS: The main difficulty lies within determining the constant coefficients involved in

computing a

k

, Wei, [6], also states that during the numerical experiments conducted,

the computation of the fixed step length a

k

could be troublesome. Within my im-

plementation I could confirm the instability caused by this formula, as the algorithm

runs fine when using the back tracking line search instead. This might take its toll

on performance though, and more important; one of the main reasons to use this

modification is now lost.

(27)

3.8 Perturbation methods 3 NUMERICAL METHODS

3.8 Perturbation methods

Wei, [6], states in Lemma 2.1 that if (α

k

, x

k+1

, g

k+1

, d

k+1

) is generated by the MBFGS3 algorithm, then B

k+1

will be positive definite ∀ k provided that s

Tk

∗ y

km

> 0 . This in turn will be guaranteed as long as we use an exact line search, as the SWC for example. On the other hand, using the knobs c

1

, c

2

to loosen up the rigidity of the SWC it is no longer an exact line search algorithm, even it it is close to have that property. So what if B

k

loses its nice property of being positive semi-definite? There are many perturbation methods available, in this case, I will go for the simplest possible. When computing the SVD one gets the eigenvalues. The simplest and easiest way to keep B

k

positive definite, is to take the smallest eigenvalue from the SVD, take the absolute value of it, add a small number (ie machine epsilon eps) and multiply the result with the identity matrix I, and add this new result to B

k

. This discussion leads our way into the Levenberg-Marquardt method 3.9 Levenberg-Marquardt

Levenberg-Marquardt, being the inspirational source to trust region methods, is a method of regularization. The main idea is to add a dampening factor λ to the GN Hessian approximation

∇F (x)∇F (x)

T

This gives:

x

k+1

= x

k

− [5F (x

k

) 5 F (x

k

)

T

+ λI]

−1

5 F (x

k

)F (x

k

)

I being the identity matrix, in other words, one increases the diagonal dominance of the Hessian (another way to put it is that the residual part of the true Hessian, namely P

m

i=1

f

i

(x)∇

2

f

i

(x), is approximated by λI). The strength of this algorithm is that it in- terpolates between the GN method and the method of steepest descent. If the dampening factor grows large enough, steepest descent is the dominating method. The Steepest De- scent method is very stable numerically, but at the same time very slow. What we gain in stability, we loose in performance, This is the most common problem in numerical analysis.

The main motivations for using this method is:

• Avoid problems occurring when the GN Hessian approximation is nearing rank defi- ciency

• Levenberg-Marquardt is a self-regularization algorithm

• Self-regularization is a nice property when solving ill-posed problems.

• LM being less sensitive to bad starting points (robustness)

• Easy to implement

Some draw backs include:

(28)

3.10 Discussion about robustness vs efficiency 3 NUMERICAL METHODS

• Linear convergence rate if the regularization is a bad approximation

• Algorithm tends to be slower than GN overall

Note: The LM method is not far from the GN in performance, when dealing with nice smooth functions, but it should be employed when stability issues arise.

Outline of the Algorithm, Choice of damping parameter

The higher the condition number, the higher one could set λ

0

, plotting graphs of the damping factor, step length and condition number might help in deciding a good value.

1. set λ

0

= 10, set v > 1

2. Calculate the error ε for λ

0

and λ/v

3. if both these values are worse than for the initial point, increase lambda by λ = λ ∗ v until the error decreases

4. if λ/v gives a smaller ε set λ = λ/v 5. if λ

0

gave the lowest ε set λ = λ

0

3.9.1 Encountered Problems and results

The LM method is very easy to implement, a few extra lines of code gives a high level of stability.

At the same time, well implemented I saw a disturbing phenomena, the step length grew very high for some loops in the optimization algorithm. This is not necessary a problem, but a very long step does not automatically mean we are in a better place for loop k then we where in loop k − 1. The sole reason for a line search method is just to prevent making to long steps, setting a new point that’s even further away from the optimum.

3.10 Discussion about robustness vs efficiency

Adding all these bells and whistles to the octave scripts might improve on robustness, or

give some information on how the model needs to be developed or improved. If it is shown

with more testing that the Hessian really needs to be computed, there are at least two

really robust ways to do this. Another approach is to try to constrain the problem, to have

a trusted region around the optimal point. Constrained optimization on a convex problem

have a lot of nice techniques available, like for example the simplex method. Constraining

the problem is way out of scope for this master thesis, at least when measured in work

load.

(29)

4 SIMULATIONS

4 Simulations

4.0.1 Framework for simulations

The output is always created by adding noise to the convolution of the input signal and transfer function. In the F ourier domain this written as Y

m

= HU + W , W being additive noise in the magnitude of 15 db SNR. The optimization is then to change the parameters of H to yield Y

m

, in other words, change the transfer function until the output Y = HU equals the perturbed Y

m

= HU + W .

4.0.2 signals used

Two different input signals are used. The first is computed via the formula:

s = B ∗ e

−a(t−τ )2

∗ cos(2 ∗ π ∗ f

c

∗ (t − τ ) + φ)

• B = 1 Amplitude

• a = 1500 ∗ 10

12

Bandwidth factor

• τ = 1 ∗ 10

−6

Arrival time

• f

c

= 35 ∗ 10

6

Pulse echo center frequency

• φ = 1 phase shift

The second is the first echo of a signal, cut out from the originally sampled signal (u_i).

1, 2 and 3 layer scenarios are simulated. In trying to avoid flooding the reader in plots of

graph for all the different methods (6 in total), all enhancements, except the full Newton,

will be done for the 3 layered model simulation. For the 2-layered model, the choice is

to show the impact of Levenberg-Marquardt. For the 1-layer model, that is very smooth,

only GN for the two different input signals will be shown. Another reason to skip graphs

of the BFGS in one and two layers is due to its iterative nature, the algorithm will not

really pick up speed in the few loops it will take to do the simulation.

(30)

4 SIMULATIONS

Figure 7: Input signal s

Figure 8: input signal u_i

(31)

4.1 1 layer 4 SIMULATIONS

4.1 1 layer

Figure 9: Condition number over the iterations

Figure 10: The cost function

(32)

4.2 2 layers 4 SIMULATIONS

Figure 11: The simulated output signal

The images to the left are simulations with the input signal s, images in the right column are generated with input signal u_i. The model does its job very nicely here, one can see the thickness of the layer in the simulated output. The main difference is the higher condition number for input signal u_i. The output for input signal s seems to be a bit cleaner. One should not be alarmed that the simulation with input signal s uses more loops, the number of loops are not given for any simulation run of the model.

4.2 2 layers

The choice to just show GN vs LM in this section is also due to the fact that so few loops is needed. The Levenberg-Marquardt method is not dependent on the number of iterations, but will start its regularization, if needed, before any steps are taken. It will reach its full damping parameter needed to decrease the cost function before taking any steps. For LM, λ

0

will be set to 10.

Figure 12: Condition number over the iterations GN

(33)

4.2 2 layers 4 SIMULATIONS

Figure 13: The simulated output signal GN

Figure 14: Condition number and the simulated output signal, input =u_i (LM)

For the LM method, it is interesting to graph the step sizes and damping over the

iterations. In this case, both the damping parameter and the step size decreases as the

algorithm converges. This is all in order.

(34)

4.3 3 layers 4 SIMULATIONS

Figure 15: Damping and step length (LM )

Performance comparison

Performance in time, measured in seconds and number of iterations in the main optimiza- tion loop. The performance time is an average of 50 runs of the model. The Gauss-Newton method wins.

input GN LM

s 1.0352/9.800 1.360/9.780 u_I 0.68619/6.7600 0.91619/6.8400

Table 1: performance in seconds/iterations, input signal and method

The interesting thing is that the model converges faster using the u_i input signal even though the condition number is much higher. The model still keeps up though, knowing the different values of acoustic impedance in the two layers, one can calculate the thicknesses.

4.3 3 layers

Both BFGS and MBFGS are used in conjunction with a line search method. The Gauss

Newton method is used both with and without a line search. The LM algorithm does not

utilize a line search at all. Since it is not a good idea to flood the reader in 30 or more

graphs of data coming from the different models, only GN is presented here in full, the

performance comparison table below will serve as the main part of this section. As before,

image on the left side shows responses for input signal s, image on the right for u_i

(35)

4.3 3 layers 4 SIMULATIONS

Figure 16: Condition number over the iterations (GN )

The condition number have increased compared to the 2-Layered model. It is steadily decreasing over the iterations. Worth noticing is the very high difference in the magnitude of the condition number between the two.

Figure 17: Condition number over the iterations BFGS (left) & LM (right) for u_i input

One of the reasons with implementing the BFGS method was investigating the condi-

tion number. Actually it generates the highest condition number of the different Quasi-

Newton methods.

(36)

4.3 3 layers 4 SIMULATIONS

Figure 18: The cost function (GN)

Figure 19: The simulated output signal

This still looks OK. Knowing that one layer is very thin and not visible on a plot like this, one have to look at the parameter vector.

Performance comparison

Performance in time, measured in seconds and number of iterations in the main opti-

mization loop. The performance time is an average of 50 runs of the model. The BFGS

algorithm wins, tightly followed by its modification, MBFGS. The MBFGS probably have

lost some edge due to the line search method attached to it. Anyway, with BFGS being

faster, it should be the method of choice.

(37)

4.3 3 layers 4 SIMULATIONS

signal GN GN

LS

LM BFGS MBFGS Fastest/Slowest

s 7.566/40.26 14.48/40.24 9.9421/39.98 6.0050/10.92 6.4076/12.86 BFGS/GN

LS

u_i 8.789/49.52 15.77/46.36 11.582/49.16 6.5127/12.52 6.8333/12.65 BFGS/GN

LS

Table 2: performance in seconds/iterations, input signal and method

So why is GN

LS

so slow? Well, the number of iterations is fairly equal to the GN method, the answer lies in the step lengths being computed. A quick check shows that a

k

≈ 1 for the entire run of GN

LS

, no matter the input signal. Computing a step length that is of no use is just a waste of clock cycles.

Another interesting thing to observe, in adding a layer to the model, is the increased

time consumption. For the Levenberg-Marquardt, the increase in time is sevenfold, and

for GN it is almost 13 times as high.

(38)

5 EXPERIMENTS USING COLLECTED DATA

5 Experiments using collected data

5.0.1 Introductory notes

For the model of 3 layers, the implementation is written in python instead of octave.

Thus, comparing performance in time between the simulations and the real life sampled data implementation is perhaps not that meaningful, on the other hand, when comparing simulations and real life results for three layers, the condition number, number of iterations, the behavior of the algorithms might be of interest. The main reasons for using python here, is the ease of automating things like data collection. Having a stand alone application also serves its purposes. All signals used here are collected at the ultrasonic lab at LTU.

For true performance results, as for an industrial application, one should rewrite the computationally heavy code in for example FORTRAN 95, and compile it with the G95 or IFC compiler (IFC being faster on an Intel processor). Python, being as slow as it is, is a very dynamic language, embedding compiled FORTRAN code as modules is very easy. Python, in this case, would serve as the glue between the FORTRAN routines.

One should also note that the code is not written with performance issues as the prime objective, as mentioned, but writing it in python gives a nice foundation. Leaving the matter of languages and going back to business, some graphs are presented below.

5.1 1 layer

For 1-layer, there is really no meaning in showing any other results than from using the GN algorithm. The relatively low complexity of the model does not give any non-trivial differences if using the enhancements or not.

Figure 20: Input signal, zoomed

(39)

5.1 1 layer 5 EXPERIMENTS USING COLLECTED DATA

Figure 21: Condition number in each iteration

This is a very nice graph for the condition number, compare this to the simulation.

Figure 22: cost function

The cost function is monotonically decreasing. The number of iterations, k is very high

here, mainly due to very strict termination rules.

(40)

5.2 3 layers 5 EXPERIMENTS USING COLLECTED DATA

Figure 23: The estimated signal

The first two pulses is what matters here, the rest is echoes still not rung out. The x-axis is in number of time steps τ .

5.1.1 2 layers

Since there are no 2 layered samples ready for testing, no results are published here.

5.2 3 layers

5.2.1 Introductory notes, termination rules

The time along the x-axis of plots showing input or output signals is in scaled time τ . Other plots showing the cost function, condition number, eigenvalues, step etc. etc. normally have the number of iterations k along the x-axis. Also note that some plots showing, for example the condition number, is logarithmic along the y-axis.

A word on termination rules;

all algorithms normally terminate on k θ

k

− θ

k−1

k< 0.0001.

Abnormal termination rules are:

• k > 200

• a

step

< 0.000001

• Hessian Approximation no longer positive definite.

(41)

5.2 3 layers 5 EXPERIMENTS USING COLLECTED DATA

5.2.2 Gauss-Newton

Using Gauss-Newton, Line search included, show some of the problems one might en- counter. The image on the left in 24 shows the curve fitting for the first echo, the image on the right for the second echo and the image on the bottom shows for echo number 3.

Figure 24: periodicity error, echo 1,2 and 3

Although the curve fit for the first echo is good, one can clearly see that for the other

two the curve fit is quite awful. The periodicity of the given and fitted cosine is off. This

is due to (a) bad initial point(s) for τ . The next pair of images shows this behavior in

extremum.

(42)

5.2 3 layers 5 EXPERIMENTS USING COLLECTED DATA

Figure 25: Utterly bad start point, curve fit echo 2-3 and eigenvalues

It should be noted that the word “Utterly” in the caption should be taken with a grain of salt. The initial value of τ

1

is of only by 15-20 time steps. Sampling in 400MHz, it is easily realized that even the smallest error in computing the start value of a layer thickness can cause serious trouble. The graph of the eigenvalues on the right shows that only 2 parameters are dominant, controlling the gradient slope. More on the eigenvalue graphs below (section 5.3).

Figure 26: Estimated function curve fit

(43)

5.2 3 layers 5 EXPERIMENTS USING COLLECTED DATA

As a contrast, figure 26 shows a better curve fit. Even this is not perfect, this is about as good as the GN method get as this point.

For completeness, the following two pictures in 27 show the corresponding condition number and eigenvalue spread.

Figure 27: Good initial point. Condition number and eigenvalues

The Hessian approximation being almost indefinite (around k = 6) gives a correspond- ing spike in the condition number.

5.2.3 Using BFGS

Still using the same initial point, let’s see what the BFGS extension can do. First out is

the method where B

0

is defined by the GN Hessian approximation.

(44)

5.2 3 layers 5 EXPERIMENTS USING COLLECTED DATA

Figure 28: Condition number and eigenvalue distribution

As we see, the condition number is behaving in a bad manner. The increasing condition number is forcing down the step length until no steps of any usable size is at hand.

Figure 29: Curve Fit for BFGS

The spatial resolution is pretty bad, temporal resolution is kind of ok.

(45)

5.2 3 layers 5 EXPERIMENTS USING COLLECTED DATA

Figure 30: Cost function decrease

Cost function decreases, as one can see, only 5 iterations are needed/feasible.

Using B

o

= I

Figure 31: Condition number and eigenvalue distribution

The condition number is steadily increasing. All eigenvalues start at magnitude 1.

(46)

5.2 3 layers 5 EXPERIMENTS USING COLLECTED DATA

Figure 32: Curve Fit for BFGS

The curve fit is really bad, the increasing condition number makes it hard for the algorithm to take good steps.

Figure 33: Cost function decrease

The cost function gives a lower end value using B

0

= I.

(47)

5.2 3 layers 5 EXPERIMENTS USING COLLECTED DATA

5.2.4 Using LM

The Levenberg-Marquardt algorithm, using the same starting point as for the Gauss- Newton and BFGS methods

Figure 34: Condition number and step length for each iteration

We see a much better behavior in the condition number over the iterations compared to the BFGS. One thing to worry about is the large step length, even though each step means a reduction in the cost function, we might have fallen out of our region of interest.

Figure 35: Curve fit for LM

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Both Brazil and Sweden have made bilateral cooperation in areas of technology and innovation a top priority. It has been formalized in a series of agreements and made explicit

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i

Detta projekt utvecklar policymixen för strategin Smart industri (Näringsdepartementet, 2016a). En av anledningarna till en stark avgränsning är att analysen bygger på djupa