2008:069
M A S T E R ' S T H E S I S
Implementation and evaluation of a parametric model for thin
layered materials
Henrik Elmunger
Luleå University of Technology Master Thesis, Continuation Courses
Electrical engineering
Department of Computer Science and Electrical Engineering
Division of EISLAB
Implementation and evaluation of a parametric model for thin layered materials
Henrik Elmunger
September 1, 2008
“Rough diamonds may sometimes be mistaken for worthless pebbles.”
Thomas Browne, Sr
Abstract
This master thesis evaluates a parametric model for ultrasound reflection in a layered medium. This model will be used for determining the thickness of each layer within the total thickness of the medium. With slight alteration one can also use it to find the total number of layers.
Ultrasound is a cheap and fast method for nondestructive testing of material properties.
Using ultrasound is a non toxic, clean (no saw or grind dust) and safe way to test as well.
Within this thesis the model is tested with both simulations and in situ tests at an industrial laboratory.
All Theory for the model is developed by Fredrik Hägglund, PhD student at LTU.
Preface
My master thesis is conceived within the frames of the M.Sc in electrical engineering pro- gram at Luleå Tekniska universitet (LTU). The thesis is commissioned by the Department of Computer Science and Electrical engineering. The thesis work, including measurements taken in a laboratory environment, have been done in Luleå and lasted between September 2006 and January 2007.
I embarked on writing this thesis knowing there would be optimization problems, and with my interest in numerical methods I found a very nice project to make my skills evolve.
I had somewhat knowledge about the physics of ultrasound, but this field have broadened a lot as well. I now take the opportunity to thank some people. First out, my examiner Johan Carlsson and PhD student Fredrik Hägglund for their time and answering any questions.
I also want to thank Inge Söderkvist for all the help regarding the numerical aspects.
CONTENTS CONTENTS
Contents
1 Background and outline 6
1.1 Layered medium . . . . 6
1.2 Ultrasound transversing through a layered medium . . . . 6
1.3 The parametric model . . . . 7
1.4 Optimization, finding an optimum . . . . 7
2 About wave propagation 9 2.1 Flaw detection . . . . 9
2.2 Acoustic impedance, Z . . . . 9
2.3 The posed Problem . . . . 10
2.4 A Parametric model as the posed Solution . . . . 12
3 Numerical methods 15 3.1 Quadratic nonlinear least squares problem and the Gauss-Newton method . 15 3.2 Maximum Likelihood Estimation (MLE) . . . . 16
3.3 Condition number, SVD and eigenvalues . . . . 17
3.4 Enhancing the Gauss-Newton . . . . 18
3.5 Line search . . . . 19
3.6 Newtons Method . . . . 23
3.7 Quasi-Newton . . . . 23
3.8 Perturbation methods . . . . 26
3.9 Levenberg-Marquardt . . . . 26
3.10 Discussion about robustness vs efficiency . . . . 27
4 Simulations 28 4.1 1 layer . . . . 30
4.2 2 layers . . . . 31
4.3 3 layers . . . . 33
5 Experiments using collected data 37 5.1 1 layer . . . . 37
5.2 3 layers . . . . 39
5.3 Analysis . . . . 47
6 Discussion and Conclusion 55 6.1 Part I, Pure Gauss-Newton . . . . 55
6.2 Part II, Enhancements . . . . 55
6.3 General conclusion . . . . 55
1 BACKGROUND AND OUTLINE
1 Background and outline
Testing is the water divider that put some theories to rest, and others into good use. In this thesis a parametric model for describing an ultrasonic pulse traveling through a layered medium is tested, both via simulations and actual testing in a laboratory environment.
The model that is implemented in this thesis simply states that “Each new layer in a medium only need two parameters, that is, a time constant that tells you the thickness of the layer, and a reflection constant that tells you how much of the energy of the wave that is reflected back to the ultrasound receiver”. Whatever this means will be explained further on.
Now, the questions mounting up are:
• Is this model is accurate enough for use in the industry, or does it need more param- eters?
• What numerical methods are required?
• Will these methods crave too much when it comes to the number of iterations used or the time needed to do the iterations?
In this introduction I will briefly explain the methods and concepts, in later sections there will be a more detailed view on each of the subtopics in this first introductory section. The final chapter contains a summary and conclusion. By then I hope that whoever reads this thesis report will agree with me that the answers have been found!
1.1 Layered medium
The definition of a layered medium is simply that different regions in the medium have different physical attributes, these regions have a sharp boundary between them. As simple examples one can mention water on ice or a metal ruler glued to a wooden desk. As an opposite one can imagine what happens when a thick steel rod is hardened, it will be brittle in the surface but soft in the center, the physical attributes are not the same for the surface and the center, but there are no sharp boundary for this change.
1.2 Ultrasound transversing through a layered medium
When the sound wave travels through a specific medium its velocity is constant. When
entering a new layer, with different density and elastic properties, the velocity changes. In
other words, a sharp boundary between two layers causes a discontinuity in the velocity
function of the wave. Further on, the product of the velocity of sound and the density of
the medium in which the wave travels is called the characteristic acoustic impedance, Z,
this value is important when determining the reflection R
cof sound at the interface of two
layers. R
cplays an important role in the parametric model.
1.3 The parametric model 1 BACKGROUND AND OUTLINE
1.3 The parametric model
The model consists of a usual concept of an input signal u, transfer function h and output signal y
est.
Figure 1: Transfer function block
The parameters θ lie within the transfer function h. The input and output signals are connected via the convolution integral:
y
est(t) =
∞
Z
−∞
h(θ, τ )u(t − τ )dτ
Comparing y
estwith a measured signal y
m, it is possible to change the parameters in the θ vector to minimize the error
y
est− y
mThis branch of numerical algorithms is called optimization 1.4 Optimization, finding an optimum
For readers unfamiliar with the concept one of the most fair ways of describing the process
of optimization goes like this: Imagine a bowl, the bowl doesn’t have a flat bottom, but its
cross section describes half a sphere. The optimum? Put a pebble in the bowl, it will roll
around for a while but stop at the bottom. In other words, the bottom of the bowl is the
optimum, the process of optimization is the pebble. To make it a tad more realistic, use a
hammer and give the bottom of the bowl a beating, from both the inside and the outside,
it will hold, it is not an imaginary glass bowl! Now, if one drops the pebble back into the
bowl, it will rattle around a bit, and sooner or later find a cavity. The thing is, do we know
that this cavity is the deepest? No, it might be, it might be a more shallow cavity, or even
a flat area between two cavities. The pebble has found a local minimum if it rests in a
cavity, and we can guarantee that it does just that, that is to guarantee global convergence,
and as already mentioned, we cannot guarantee that it finds the deepest cavity, that is, we
cannot guarantee global optimization. To make the pebble find the minimum of the cavity
we are interested in, we have to release it in a good spot, we also need to make sure that
1.4 Optimization, finding an optimum 1 BACKGROUND AND OUTLINE
we don’t give it too much momentum, as it will slide over the edge into another region.
These last two examples are analogies for initial point and line search.
2 ABOUT WAVE PROPAGATION
2 About wave propagation
Literature on this subject is abundant. The emphasis in this thesis is on the Numerical aspects and not on the underlying physics. Therefore this section is kept short.
An ultrasonic pulse traveling through a homogeneous material does so with a constant velocity. A factor of attenuation (dependent on the frequency of the wave) determines how far the pulse can travel. If the material is in any way layered, ie, contains a sharp boundary between different materials with different properties, the pulse will divide its energy into being a) reflected and b) transmitted into the new material. Thus the boundary being a discontinuity in the equation that governs the wave propagation. In this second material, the pulse will travel with a different speed, and having a different attenuation constant (due to a frequency change). In other words;
W avelength = V elocity f requency or
λ = c/f
Here c is dependent on the density and spring constant of the material.
2.1 Flaw detection
The wavelength, λ, is vital when it comes to the ability to detect flaws in a medium, ie, be reflected on a boundary. The rule of thumb is that the region causing the discontinuity needs to be larger than at least half the wavelength in order to be detectable. The key words here are sensitivity and resolution. Sensitivity being the ability to detect small discontinuities and resolution is the ability to locate two or more discontinuities that lay close to each other. As seen in the equation of wavelength, increasing the frequency term is directly proportional of decreasing the wavelength, and, increase the resolution and sensitivity.
On the other hand, increasing the frequency has its disadvantages. Some materials, such as steel or cast iron, have a coarse grain structure. A wave of too high frequency will reflect on the grain boundaries and scatter. Another source for the pulse to scatter is if a crack is orientated in an angle too far off being perpendicular to the pulse source.
2.2 Acoustic impedance, Z
Sound travels through materials under the influence of sound pressure. Because molecules or atoms of a solid are bound elastically to one another, the excess pressure results in a wave propagating through the solid.
The acoustic impedance Z of a material is defined as the product of its density ρ and
acoustic velocity c . This property is important when calculating the reflection R
cand
transmission T
cat the boundary. R
cis given through the formula:
2.3 The posed Problem 2 ABOUT WAVE PROPAGATION
R
c= ( Z
2− Z
1Z
2+ Z
1)
2that says that the amount of reflection is dependent on the difference of Z on each side of the boundary. Since Z is easy to calculate, or even find in a table, for different materials, it is equally easy to compute R
c. A parametric model that uses R
ci(for boundary i) as parameters is good, this way computing an initial guess for the optimization loop is as easy as computing Z.
The other kind of parameters used in the model is the time of flight τ , knowing the thickness, d, of a layer, and its acoustic velocity, c, τ is also easy to compute as
τ = 2 ∗ d c 2.3 The posed Problem
What if the layer to be detected is
1. Thinner than half the wave length of the ultrasonic pulse used to detect it?
2. A transducer capable of higher frequencies (shorter wave length) is not available, or just very expensive?
3. A transducer capable of higher frequencies (shorter wave length) is available, but not usable?
Increasing the frequency of the transducer is not always a feasible route even if transducers are readily accessible, depending on the materials involved, most of the pulse might bounce off the topmost layer if the frequency is too high. This in turn would lead to a non detectable echo from layers beneath. Figure 2 and 3
1below will be used for reference in this discussion.
1Images taken from [9]
2.3 The posed Problem 2 ABOUT WAVE PROPAGATION
Figure 2: First layer of a material
Figure 2 illustrates the train of thoughts, u
1being the input signal, A
0being the response from the first layer boundary, B
1where the pulse bounces on the second layer boundary, A
2being the response of B
1. If the frequency is too high, most of the signal energy will be found in A
0due to the reflection. In turn this leads to a large knowledge about the 0, 1 boundary, but A
2will be lost in noise.
The next figure (3) shows an example of a received signal, with voltage levels on the Y-axis, and number of samples on the X-axis. Between arrows 1 and 2 (a pulse indicates a boundary) is a very thin layer, too thin to be picked up by the ultrasonic receiver (ie, shown as a pulse). This layer might be some adhesive between two sheets of metal.
Figure 3: Example of a received signal (3 layers)
The fact that there is an echo at arrow no. 2 indicates that the sample is solid right
through, an air gap will cause the signal to disperse. On the other hand, if there are limits
on how thick or thin the adhesive layer may be, at this moment there is no way of telling
2.4 A Parametric model as the posed Solution 2 ABOUT WAVE PROPAGATION
if the joint is ok or not. However, a discontinuity in this solid affects the ultrasonic pulse, hopefully enough.
2.4 A Parametric model as the posed Solution
The posed solution as of [9] is a parametric model. Using the concepts within this chapter along with fig. 2 it is easy to explain the model for a single layered (homogeneous) material.
As stated in chapter 1.3, the model consists of input signal, transfer function and output signal, related to each other via the convolution integral. Namely:
y(t) =
∞
Z
−∞
h(θ, τ )u(t − τ )dτ + err
Here θ denotes the parameter list. In other words, the input and output signal is kept
“as is”, while the transfer function h[θ, τ ] is optimized to minimize the error term. Using frequency domain notation, this equation is written as
Y (ω) = H(θ, ω)U (ω) where ω = 2πf
2.4.1 Model for a homogeneous solid
Presenting this figure again, this time for the purpose of drawing the parameters directly from it. Layer 0 here represents the water in which the pulser/receiver is immersed, layer 1 represents the solid to me measured (between boundaries [0, 2]).
Figure 4: First layer of a material
2.4 A Parametric model as the posed Solution 2 ABOUT WAVE PROPAGATION
We need 4 parameters in θ:
• R
01=Reflection constant at boundary [0, 1]
• R
12=Reflection constant at boundary [1, 2]
• τ
0= Time constant at boundary [0, 1], the time it takes for the pulse to travel the distance 2 ∗ d
0• τ
1= Time constant at boundary [1, 2], the time it takes for the pulse to travel the distance 2 ∗ d
1The output signal from the top layer U
00is the sum of all the echoes A
1iand A
01i. These summed up echoes are just the input echo times the transmission/reflection constant times the time of flight constant. In the case of the homogeneous solid this comes down to:
H = R
01∗ e
−iωτ0+ (1 − R
201) ∗ e
−iω(τ0+τ1)∗ R
121 + R
01∗ R
12∗ e
−iωτ1Note: T
q,q+1= 1 − R
q,q−12.4.2 multiple layers
For multiple layers we also need the contributions from layers below 2, as in fig. 5
Figure 5: Subsequent layers
In [9], the output signal at layer q can be calculated using the following building block equations:
• A
q=
(1−Rq−1,q1−R)Rq,q+1(1−Rq,q−1)e−iωτqq,q−1Rq,q+1e−iωτq
• B
q=
(1−Rq,q−1)e−iω 1 2τq
1−Rq,q−1Rq,q+1e−iωτq
2.4 A Parametric model as the posed Solution 2 ABOUT WAVE PROPAGATION
• C
q=
(1−Rq−1,q)e−iω 12τq
1−Rq,q−1Rq,q+1e−iωτq
• D
q=
Rq,q−1e−iωτq1−Rq,q−1Rq,q+1e−iωτq
These equations are the building blocks of the transfer function H, namely H = A + BC
D
The principle is the same as in the case of a homogeneous material, summing the echoes from below.
2.4.3 Concluding remarks
Even though a layer might be too thin to show up as an echo on a plot of a output signal,
the inhomogeneity will steal energy from the ultrasonic pulse traveling through it, some
part of said pulse will be reflected, and some part transmitted into the next layer. Having a
good idea on location and thickness for this inhomogeneity will serve as a starting guess for
an iterative optimization loop. In other words, the physical reality have been parametrized.
3 NUMERICAL METHODS
3 Numerical methods
The model is implemented whit in the Fourier domain, or frequency domain. Not that the frequency change in the ultrasonic wave is of interest, but due to the fact that convolution in the time domain becomes multiplication within the frequency domain. That is F {h∗∗g} = H ∗ G, where ∗∗ denotes convolution.
3.1 Quadratic nonlinear least squares problem and the Gauss-Newton method
The Gauss-Newton method is a modification to Newtons method. Newtons method uses a truncated Taylor expansion using two derivates.
The parametric model for the layered medium have been briefly described in previous sections. In this section we will see how the model can be turned into a least squares problem. First we need the theory for quadratic nonlinear least squares. The mathematics is rather simple in theory and the method goes like this:
minimize f (x) = 1 2
m
X
i=1
f
i(x)
2≡ 1
2 F (x)
TF (x) The chain rule gives the components of 5f (x) :
5f (x) = 5F (x)F (x)
The Hessian B
k= 5
2f (x) is derived by differentiating with respect to x
j: 5
2f (x) = 5F (x) 5 F (x)
T+
m
X
i=1
f
i(x)∇
2f
i(x)
The Gauss newton method simply lets the Hessian be approximated by discarding the right hand sum in the above formula, ie
∇
2f (x) ≈ ∇F (x)∇F (x)
TNote that f (x) is the total sampled signal, and that x is the parameter vector. Updating the parameter vector is done by the following definition of the Normal equations:
x
k+1= x
k− [∇
2f (x
k)]
−1∇f (x
k)
≈
x
k+1= x
k− [5F (x
k) 5 F (x
k)
T]
−15 F (x
k)F (x
k)
Now x
k+1can be solved using regular least squares methods. In this particular case,
QR or SVD is recommended.
3.2 Maximum Likelihood Estimation (MLE) 3 NUMERICAL METHODS
The identity [5F (x
k) 5 F (x
k)
T]
−15 F (x
k)F (x
k) is called the search direction, from here denoted by p.
Note: Solving the normal equations is equivalent of solving a linear least squares problem:
minimize for p :
k ∇F (x)
Tp k
22= [∇F (x)
Tp + F (x)]
T[∇F (x)
Tp + F (x)]
This can be done using a numerically stable form of QR-Factorization, preferably ob- tained via householder transformations. Due to a lack of implemented truncated QR meth- ods in the libraries the Scipy package use for implementation, this route is yet untested.
The advantage of solving for x
k+1in this way is that a much lower condition number is obtained. If the condition number for the Normal equations equals κ
N e, the QR approach gives κ
QR= √
κ
N e. See [2] page 416 and [5] pages 295-300 for details. More on the subject of the condition number is presented below.
3.2 Maximum Likelihood Estimation (MLE)
MLE is a popular statistical method used to calculate the best way of fitting a mathematical model to some data. Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model (in this case θ) to provide an optimum fit.
A short description might be, in MLE, setting σ
e= 1 yields the method of Nonlinear least squares.
3.2.1 The model within the terms of optimization
The model is built on the principle of the maximum likelihood estimator. The model is built with the parameter vector θ within the transfer function.
• Y is the output signal
• H(θ) is the transfer function
• U is the input signal
In the Frequency domain this comes down to Y = H(θ)U . Both Y and U is corrupted by noise, so the model needs to be scaled proportional to the noise. This is done in the following manner using the variance of the measured noise σ
2e(θ). The error of the signal is defined as
ε(θ) = Y − H(θ)U
σ
eand defining the variance σ
e2as
3.3 Condition number, SVD and eigenvalues 3 NUMERICAL METHODS
σ
e2= σ
2y+ σ
2u|H(θ)|
2− 2<{σ
yu2H(θ)}
For the computation of σ
y2, σ
u2, σ
yu2see [7], [10]
To return the the nonlinear least squares formulation, an making the following defini- tions:
• ε(θ) ≡ F (x)
• ε
2(θ) ≡ f (x) = F (x)F (x)
T• ε
0(θ) ≡ ∇f (x) = ∇F (x)F (x)
• ε
00(θ) ≡ ∇
2f (x) ≈ ∇F (x)∇F (x)
Tθ
k+1= θ
k− [∇F ∇F
T]
−1∇F F 3.2.2 Good enough for simulation
When the input signal is well behaved, like an ideal mathematical model of the pulse, the model can take a lot of beating when it comes to making initial guesses or adding extra noise. Another test is also applied, using a real echo from the surface of the topmost layer as input signal. Even though this gives a higher condition number, the model still behaves in a stable manner.
Still, when moving from the well behaved simulations into the real world, one might encounter stability problems, and there is also the question of performance. Several ways of enhancing the Gauss-Newton algorithm are available, some of them are very easy to implement, other have a higher degree of complexity but may prove necessary.
3.3 Condition number, SVD and eigenvalues
The two types of error sources occurring in numerical analysis is a) errors in the measured data, the input to the algorithm, and b) the errors caused by the algorithm itself, due to approximations within it. If a small change in input data causes a large change in the output, for example if | f (x + δx) − f (x) |>> 0 even though δ is small, the problem is said to be ill-conditioned at x.
The condition number of a matrix A is defined by:
κ
∞=k A k
∞k A
−1k
∞Due to the fact that this problem is ill-conditioned, ordinary least square methods is
prone to error, or may not even exist. There is need for a single value decomposition of
the problem, that is, in the inversion of the Hessian matrix. The condition number for the
Hessian, ie the matrix to invert, is high. For the distribution of the eigenvalues in Figure 1
below, the condition number = 54593. This means that noise/errors that is added to this
model will be heavily increased in magnitude.
3.4 Enhancing the Gauss-Newton 3 NUMERICAL METHODS
Figure 6: Eigenvalues on a semi-log scale, 8 parameters, Hessian size is 8x8
Keeping track of the eigenvalues under the process of optimization is a good thing. If one or more eigenvalues tends towards 0 while k increases (k being the number of iterations) the problem is ill posed. In this case a regularization method or a different model should be considered.
In a more performance based implementation, perhaps a QR factorization should be used instead, the SVD is a computationally heavy method.
3.3.1 Better approximation of the Hessian ?
It may also be of interest to do a better approximation of the Hessian matrix. There is a chance that the residual part discarded from the Newton method (the get Gauss-Newton) is large. This might have a) a large impact on the spread of eigenvalues/condition number and b) the curvature of the second derivates.
3.4 Enhancing the Gauss-Newton
The following enhancements are examined in this thesis:
• Line search methods
• Newtons method
• Quasi-Newton methods
• Combined Quasi-Newton/line search method
• Levenberg-Marquardt algorithm
3.5 Line search 3 NUMERICAL METHODS
A brief explanation and motivation of these methods are given below, and also the simu- lation results. The performance results will be based on:
1. Parameter sanity
2. Shape of the output signal (curve fitting) 3. Time of completion
• Even though the number of iterations, or total time of solving the problem is of great importance in a real world scenario, trading a tad more machine time to get sane results is more important. Also, with this master thesis being more of a “proof of concept” than an industrial application, the time of completion is of lesser impor- tance. The machine time used is either measured by the *NIX command “time”. (On my freeBSD machine: $ time ’script’ time ) or if running scripts in octave: tic ’script’
toc.
• The shape of the output signal is important, but slight differences between a measured and estimated graph should be tolerable.
• The parameter sanity is based on the physics, as an example: τ
1−3cannot be negative and the reflection constants R
01−34is bound between −1 < R
xx< 1.
3.5 Line search
As stated above, there is something called global convergence, and we might even guarantee it, but there really is not anything like global optimization. A line search is a way to keep your optimization method of choice within your trusted region. These methods do need their safeguards though, a poorly conditioned Hessian and gradient can make make these algorithms flip and make our optimization method jump into another region, and the problem that led us into using line search bites its own tail. The two different line searches I have used is the back tracking algorithm and the strong Wolfe conditions, SWC. My main efforts have been with the SWC since this method is preferred when using quasi-newton methods. More on Wolfe conditions below.
For both methods a first trial step length of α
init= 1 should be selected. This is due to the fact that often we only want our line search methods to be operative in the first iterations, the closer we get to our local optimum, the more likely we are to get (in theory and under nice well behaved conditions) an unthrottled step length. In other words, the closer we get to the target, the faster we may venture on (the line search algorithm starts out fresh with each new iteration of the optimization method).
Before going further it would be convenient to define new terms and notations:
• Letting Φ(α) denote f (x
k+ α
k∗ p
k)
• Φ(0) denote f (x
k)
• Φ
0(α) denote 5f (x
k+ α
k∗ p
k) ∗ p
Tk3.5 Line search 3 NUMERICAL METHODS
• Φ is a function of one variable only, namely α
• α
kis the step length
• p
k= [5F (x
k) 5 F (x
k)
T]
−15 F (x
k)F (x
k) is the search direction
• x
kis the parameter vector 3.5.1 Back tracking
This inexact algorithm only makes use of the Sufficient decrease condition or Armijo con- dition given by the inequality f (x
k+ α
k∗ p
k) > c
1∗ α
k∗ 5f (x
k) ∗ p
k. It is an inexact algorithm due to the fact that α
kdon’t need to be an absolute minimizer. This property makes it much faster than what an exact line search would be.
The back tracking algorithm framework is defined as follows:
Given c
1∈ (0, 1/2)
set α
k= 1 this is the initial trial step length
while f (x
k+ α
k∗ p
k) > c
1∗ α
k∗ 5f (x
k) ∗ p
kdo : α
k∗ c
2, end 0 < c
1< c
2< 1 x
k+1= x
k+ α
kp
kwhere again:
• α
kis the step length
• p
kis the search direction
• x
kis the parameter vector
I have used brute force back tracking in the sense that I decrease α
kwith a small constant for each spin in the loop, It wastes a lot of clock cycles but does its job pretty quickly. One can the quadratic and/or cubic interpolation to get a new α
trialat each iteration though, I have not tried that as I only need the back tracking algorithm to get something stable for comparison to the SWC. On a further note, the term back tracking means that one tries a large α and then keeps reducing it until the conditions are met.
3.5.2 Wolfe conditions: Quadratic and Cubic approximation
Quadratic approximation is used to find a minimizer α
minon the interval x
k+ α
trialp
k. We already know f (x
k), f
0(x
k), and calculating f (x
k+ p
k) (with α
trial= 1) gives a third piece of information. We use these three pieces to interpolate the approximation as follows:
Φ
q(α) = (
Φ(α0)−Φ(0)−αα2 0∗Φ0(0) 0) ∗ α
2+ α ∗ Φ
0(0) + Φ(0)
3.5 Line search 3 NUMERICAL METHODS
The new trial value is defined as the minimizer of this quadratic, differentiating we get:
α
1= −
2[Φ(α Φ0(0)∗α200)−Φ(0)−Φ0(0)∗α0]
If α
i, which is the new trial value, fails to accommodate the conditions of the line search, one might want to use this new value to do a cubic approximation:
Φ
c= c
low∗ α
3+ c
high∗ α
2+ α ∗ Φ
0(0) + Φ(0) with the formula for c
low, c
highas follows:
c
lowc
high= 1
α
20α
21(α
1− α
0)
α
20−α
21−α
30α
31Φ(α
1) − Φ(0) − Φ
0(0)α
1Φ(α
0) − Φ(0) − Φ
0(0)α
0Note: c
low, c
highdoes not have anything to do with the maximum or minimum values set for α
low, α
high.
3.5.3 Wolfe conditions
The Wolfe condition is fairly easy mathematics, the implementation isn’t. Even the text- book suggested the reader to use a already written program instead of doing an implemen- tation.
The SWC algorithm works with two conditions, one is the sufficient decrease condition used in the back tracking algorithm, the second condition is about the derivates of the function. Thus it is called the curvature condition.
• Φ(α) ≤ Φ(0) + c
1α
trialΦ
0(0)
T∗ p sufficient decrease condition
• Φ
0(α)
Tp ≥ c
2Φ(0)p curvature condition weak Wolfe conditions
• |Φ
0(α)
Tp| ≥ c
2Φ(0)p curvature condition strong Wolfe conditions
The difference between the strong and weak WC is that the strong WC is an exact line search. Adjusting the value of c
1(≈ 1E −4) and c
2(≈ 0.9) will loosen up the SWC though, it will then acquire about the same amount of work as the WWC. By making c
2smaller, one can control the quality of the search, forcing the accepted value of α to lie closer to a local minimum of the interpolated curve between Φ(0) and Φ(a).
The framework for the algorithm is as follows:
choose α
low= 0, α
trial> 0, α
max> α
trialendless while loop
if Φ(α) > Φ(0) + c
1α
trialΦ
0(0)
let zoom algorithm compute a new α
trialif |Φ
0(α)| ≤ −c
2Φ(0)
set the step length α
step= α
trialand break if Φ
0(α) ≥ 0
let zoom chose a new interval and set a new α
trialend
3.5 Line search 3 NUMERICAL METHODS
3.5.4 Bracketing an interval (zoom)
The bracketing algorithm in [5] is called “zoom”, I will adopt this name within this thesis.
The zoom algorithm also utilizes an infinite loop that is only broken if a good enough step length is found, ie there is no guarantee that this will happen. One can build some security into the algorithm that won’t let the next step length be less than one tenth of the previous (α
k+1≡ at least 0.1 ∗ α
k).
The framework of the zoom algorithm is much alike the one of the SWC:
interpolate using quadratic or cubic interpolation to find a new α within the interval of (α
low, α
max)
if Φ(α
trial) > Φ(0) + c
1α
trialΦ
0(0) set α
high= α
trialelse if |Φ(α
trial) ≤ c
2Φ
0(0)
deliver α
trialto the SWC for testing and break if Φ
0(α
trial)(α
high− α
low) ≥ 0
set α
high= α
low; α
low= α
trialend
3.5.5 Tie them together
The Quasi-Newton methods asks the Wolfe condition about the generic step length α = 1, if the SWC is not satisfied it calls on the zoom function, that in turn calls for the interpolation scheme until satisfied, zoom then returns a trial value of α to the Wolfe condition that tests this new trial value. If the SWC is satisfied, it returns the actual step length to the Quasi-Newton method.
3.5.6 Encountered problems and results Brute force back tracking
This method works, even though for some initial points of the parameter vector, the step length becomes very small (giving a small reduction in the cost function). Again, a too small step length might terminate the optimization loop prematurely, to use the image of a ravine, or perhaps a ditch leading off water, the sides being very steep but the bottom having a very slow descent. In order not to climb up the walls of the ditch the step length must be kept small, this small value, in turn, causes a small change in the norm of kθ
k+1− θ
kk, thus the premature termination.
As the name suggests, this method is quite slow and should be used only as a debugging tool.
SWC
This method is quite slow for some initial points of the parameter vector. Even though
more elegant, SWC uses a lot of function calls (for example, computing the derivatives
3.6 Newtons Method 3 NUMERICAL METHODS
in each new trial version of the step length). On the other hand, the curvature condition protects us from the dangers of premature termination.
3.6 Newtons Method
In Newtons method, (that is not really an enhancement of the GN method, rather the other way around), The Hessian is fully computed, ie, one goes through the trouble of computing the residuals, the P
mi=1
f
i(x)∇
2f
i(x) part of the second derivates. This is often avoided due to being both error prone (ie human errors in defining the second derivates) and being computationally heavy. The benefit being a much more rapid convergence, Newtons method gives quadratic convergence rates close to the optimum compared to the linear or super-linear convergence rates of Quasi-Newton methods. The motivation to implement and use the Newton method within this thesis is mainly for benchmarking the parameter sanity and curve fitting properties.
Note: I used Matlab to generate∇
2f
i(x) via symbolic differentiation. As Matlab probably does the symbolic computations right, but the file containing the formulas for second order derivates is huge, ≈ 54000 characters of code. It should also be noted that when not using the Matlab command “simplify” the above mentioned file is much larger, with measurable round off errors between the two. This leads to an argument if one might win that much of the extra accuracy with clumsy Matlab generated code. Some trial runs showed that the Newton method is indeed the slowest.
3.7 Quasi-Newton
A quasi-Newton method is almost any method where the Hessian is not explicitly com- puted, instead one does an approximation. This can be done in a lot of ways, from the naive method of just setting the Hessian B
0equal to the identity matrix I to a more elab- orate method like the BFGS (Broyden-Fletcher-Goldfarb-Shanno) method. In the middle somewhere (performance wise) lies the above described Gauss-Newton method, a fairly simple method suited for well defined problems or as a workhorse in the development and simulation phase. In this thesis I have decided to include my experiences implementing the BFGS and Modified BFGS (MBFGS ) methods. One of the reasons for this is to see if the Hessian is really worth computing explicitly. Using Newtons method, the Hessian is computed at cost O(n
3) vs O(n
2) for an efficient quasi -Newton method.
3.7.1 BFGS and Modified BFGS
The BFGS is considered to be the most efficient method of choice for approximating the
Hessian B
k. During my Pre-study and information collection for this thesis, and in deciding
whatever approximation method for B
kto use, I ran across the Modified BFGS. These to
methods will be discussed below in due order.
3.7 Quasi-Newton 3 NUMERICAL METHODS
3.7.2 BFGS (Broyden-Fletcher-Goldfarb-Shanno)
• s
k= θ
k+1− θ
k• y
k= ∇f (x
k+1) − ∇f (x
k)
• B
0= I
B
k+1= B
k− (B
ks
k)(B
ks
k)
T(s
TkB
ks
k) + y
ky
kTy
Tks
kAs seen from the BFGS formula, this is an iterative method, giving a better approxi- mation of the Hessian for each loop. Thereby giving a better estimate of the curvature of the model than the GN, thus increasing the rate of convergence.
Outline of the algorithm
I have added a modification to this algorithm as well, instead of letting B
0= I, I let B
0= ∇F ∇F
T, that is, the Gauss - Newton approximation. The outline of the algorithm is taken from [2]
• p
kis the search direction
• α
kis the step length 1. For k = 0, 1...
2. if x
koptimal, stop
3. Solve B
kp = −∇f (x
k) for p
k4. Determine x
k+1= x
k+ α
kp
k5. Compute s
k, y
k6. Compute B
k+1to use for the next loop
3.7.3 MBFGS (Modified Broyden-Fletcher-Goldfarb-Shanno)
The reason for the modification is increased performance. Wei, [6], shows that this method owns the property of super-linear convergence under some reasonable conditions In this thesis I have not shown that these conditions hold, but it might be an interesting feature to implement in future work. Furthermore, Wei, [6], uses a fixed formula for the step length a
k, a cheaper way than using back tracking.
• s
k= θ
k+1− θ
k• g
k= ∇f (x
k)
3.7 Quasi-Newton 3 NUMERICAL METHODS
• y
k= ∇f (x
k+1) − ∇f (x
k)
• y
km= y
k+ A
ks
k• α
k=
kpδgTkpkkk2Qk
, where kp
kk
Qk= q
p
TkQ
kp
kfor some positive semi definite matrix Q
k, and δ ∈ (0, ν
min/L)
• L satisfies k g
k− g
k+1k≤ L k x
k− x
k+1k
• ν
minsatisfies ν
minp
Tp ≤ p
TQ
kp ≤ ν
maxp
Tp
A
k= 2[f (k
k) + f (x
k+1) + (g(x
k+1) + g(x
k))
Ts
k] ks
kk
2B
k+1= B
k− (B
ks
k)(B
ks
k)
T(s
TkB
ks
k) + y
kmy
Tkmy
kmTs
kOutline of the algorithm
I have added my own modification to this algorithm as well, instead of letting B
0= I, I let B
0= ∇F ∇F
T, that is, the Gauss - Newton approximation.
1. set a constant ρ ∈ (0, 1/2), set B
0= ∇F ∇F
T2. compute p
kby solving p
k= −B
k−1∇F (x
k)
TF (x
k)
3. if f (x
k+ p
k) ≤ f (x
k) + ρg
kTp
k, let a
k= 1, else, use the step length formula 4. set x
k+1= x
k+ a
kp
k5. update B
kto get B
k+1using the MBFGS formula 6. go to step 2
3.7.4 Encountered Problems and results
BFGS: The BFGS Hessian approximation does not help much in the first few iterations of the optimization loop, as it is an iterative method. This is bad in the context of the line search, especially if using the SWC. See the Encountered problems and results under the line search topic. On the other hand, using the GN Hessian approximation as B
0, some improvement should be seen for the second loop.
MBFGS: The main difficulty lies within determining the constant coefficients involved in
computing a
k, Wei, [6], also states that during the numerical experiments conducted,
the computation of the fixed step length a
kcould be troublesome. Within my im-
plementation I could confirm the instability caused by this formula, as the algorithm
runs fine when using the back tracking line search instead. This might take its toll
on performance though, and more important; one of the main reasons to use this
modification is now lost.
3.8 Perturbation methods 3 NUMERICAL METHODS
3.8 Perturbation methods
Wei, [6], states in Lemma 2.1 that if (α
k, x
k+1, g
k+1, d
k+1) is generated by the MBFGS3 algorithm, then B
k+1will be positive definite ∀ k provided that s
Tk∗ y
km> 0 . This in turn will be guaranteed as long as we use an exact line search, as the SWC for example. On the other hand, using the knobs c
1, c
2to loosen up the rigidity of the SWC it is no longer an exact line search algorithm, even it it is close to have that property. So what if B
kloses its nice property of being positive semi-definite? There are many perturbation methods available, in this case, I will go for the simplest possible. When computing the SVD one gets the eigenvalues. The simplest and easiest way to keep B
kpositive definite, is to take the smallest eigenvalue from the SVD, take the absolute value of it, add a small number (ie machine epsilon eps) and multiply the result with the identity matrix I, and add this new result to B
k. This discussion leads our way into the Levenberg-Marquardt method 3.9 Levenberg-Marquardt
Levenberg-Marquardt, being the inspirational source to trust region methods, is a method of regularization. The main idea is to add a dampening factor λ to the GN Hessian approximation
∇F (x)∇F (x)
TThis gives:
x
k+1= x
k− [5F (x
k) 5 F (x
k)
T+ λI]
−15 F (x
k)F (x
k)
I being the identity matrix, in other words, one increases the diagonal dominance of the Hessian (another way to put it is that the residual part of the true Hessian, namely P
mi=1