EXAMENSARBETEN I MATEMATIK MATEMATISKA INSTITUTIONEN, STOCKHOLMS UNIVERSITET

(1)

EXAMENSARBETEN I MATEMATIK

MATEMATISKA INSTITUTIONEN, STOCKHOLMS UNIVERSITET

Splines: A theoretical and computational study

av

Magnus Johansson och Kristoffer Sahlin

2008 - No 10

(2)

(3)

Splines: A theoretical and computational study

Magnus Johansson och Kristoffer Sahlin

Examensarbete i matematik 15 h¨ ogskolepo¨ ang, p˚ abyggnadskurs Handledare: Hans Rullg˚ ard

2008

(4)

(5)

Abstract

The purpose of this paper is to fit a curve f (x) to a set of points (x1, y1), . . . , (xn, yn).

We want this function to be such that the error f (xi) − yi, i = 1, . . . , n is small, but at the same time we want f (x) to be reasonably smooth. We will do this by considering smoothing splines, which are minimizers of a particular functional. An interpolation coefficient denoted λ, that is included within the functional, captures the trade-off between smoothness and interpolation (the deviation of f (x) from the points).

We will use simple theory of optimization in vector spaces to derive this function f (x).

We will also show an example on how the behaviour of f (x) will vary depending on the choice of λ.

(6)

(7)

1 Introduction

The purpose of this paper is to fit a curve f (x), to some points (x1, α1), ..., (xn, αn), that can for example be interpreted as experimental measurements, where xi are points in an interval (a, b) with x1 < x2 < . . . < xn, and αi are some real numbers. We want the function f (x) to be such that it is relatively smooth and the error f (xi) − αi is small in some sense. One way to fit a curve to some points is to consider smoothing splines, which are minimizers of the functional:

E(f ) = Z _b

a

(f⁰⁰(x))²dx + λ Xn i=1

(f (xi) − αi)², λ > 0

This is a problem of optimization in function spaces. All definitions and theorems that will be used are stated in section 2. We have chosen to omit the proof of these theorems since they can be quite complicated and will not really increase one’s understanding in the actual matter. In section 3 we apply these theorems to the problem. In section 4 we derive certain properties of the minimizer using elementary calculus and basic linear algebra. In the references we have given two examples of books concerning these matters.

In the equation above you can interpret λ as a parameter that captures the trade-off between the smoothness and interpolation (deviation of f (x) from the points (x1, α1), . . . , (xn, αn)). We will refer to λ as the interpolation coefficient. The two criteria of smoothness and interpolation don’t walk hand in hand. A smoother curve will be flatter, thus the most deviant measurement values will play less part when “constructing” the function. The higher value on λ the more the function will focus on going through the points (x1, α1), ..., (xn, αn).

You can see that by looking at the second part of the functional which will be large if you don’t make the gap between f (xi) and αi small. In section 5 we will consider an example on the impact of choosing different values of λ.

Another application besides fitting a curve to experimental measurements is when you have a function, say g(x) whose function values are really hard to compute. In this case you could settle for calculating the function values of g(x) in some points and then use smoothing splines to approximate this function. If these smoothing splines approximates the complicated function well, then they could be used to calculate other values of g(x) in a much simpler way. In this case you obviously would choose a large interpolation coefficient λ.

2 Necessary definitions and theorems

In this section we will define all notions as well as state all theorems needed in order to solve the problem of minimizing E(f ). All definitions and theorems that we have used, as well as the proofs of the theorems can be found in [1] in the references. We will hovewer state the necessary theorems and definitions here as well. First, have a look at the following theorem:

Theorem 1. Let X be a normed space and I : X −→ R differentiable. Suppose that I is convex. If x0∈ X is such that (DI)(x0) = 0, then I has a global minimum at x0. Moreover, if I is strictly convex, the global minimum is unique.

This is, as you may have noticed, a generalization of the well-known theorem concerning optimization of functions from R to R. This powerful theorem is pretty much everything we need in order to solve the problem of minimizing E(f ). Before applying this theorem to our problem, let us first have a closer look at all the notions involved so that we fully understand

(10)

their meaning. In the text below, we will give the definition of normed spaces; continuity, convexity and differentiation among functions from X −→ Y , where X and Y are vector spaces.

Definition 1. A vector space over R is a set X together with two functions:

+ : X ×X −→X, called vector addition and · : R × X −→ X, called scalar multiplication that satisfy the following:

V1 For all x1, x2, x3∈ X, x1+ (x2+ x3) = (x1+ x2) + x3

V2 There exists an element, denoted by 0 (called the zero vector) such that for all x ∈ X, x + 0 = 0 + x = x

V3 For every x ∈ X , there exists an element, denoted by −x, such that x + (−x) = (−x) + x = 0

V4 For all x1, x2∈ X, x1+ x2= x2+ x1

V5 For all x ∈ X, 1 · x = x

V6 For all x ∈ X, and all α, β ∈ R, α · (β · x) = (αβ) · x V7 For all x ∈ X, and all α, β ∈ R, (α + β) · x = α · x + β · x V8 For all x1, x2∈ X, and all α ∈ R, α · (x1+ x2) = α · x1+ α · x2

Definition 2. Let X be a vector space over R. A norm on X is a function k · k: X −→ [0, +∞) such that:

N1 (Positive definiteness) For all x ∈ X, kxk ≥ 0. If x ∈ X, then kxk = 0 iff x = 0 N2 For all α ∈ R and for all x ∈ X kαxk = |α|kxk

N3 (Triangle inequality) For all x, y ∈ X, kx + yk ≤ kxk + kyk.

A vector space equipped with a norm is called a normed space. The concept of a normed space is an important one. Let us explain what a normed space is in a more intuitive way so that one can get a better picture of what it actually is. Consider for example two ele- ments x and y in some vector space X. A norm kx − yk is a measure of distance between these two elements, where the elements can be for example vectors or functions. Obviously kxk = kx − 0k, is a measure of the distance between x and the 0-element. What we mean with distance between two functions is not very straightforward. You can measure a distance between two functions in many different ways (use different kinds of norms), depending on what you are interested in looking for. But you must then keep in mind that a normed space consisting of a vector space V with a norm n₁ is different from the normed space consisting of the same vector space V with another norm n2. We look at some examples to clear things up even more.

In R which is a one dimensional space, the only norm is the absolute value |x2− x1| = p(x2− x1)², where x1, x2 ∈ R. This is the distance between the two numbers x2 and x1. In R² where we have a two dimensional vector space with real valued vectors, for any two vectors (x1, y1), (x2, y2) ∈ R² the standard norm is defined as k(x1, y1) − (x2, y2)k = p(x1− x2)²+ (y1− y2)², which is a measure of the distance between two vectors in a two

(11)

dimensional vector space. In the same way the standard norm is defined in higher dimensional vectorspaces over R.

But it gets more complicated when we want to measure the distance between functions.

For example when we deal with a function x(t) ∈ C[0, 1] where C[0, 1] is the set of all contin- uous functions on the interval [0, 1] you can define the normkxk∞ as max |x(t)| for t ∈ [0, 1]

if you are interested in how much the function is diverging from the t-axis on the interval [0,1] at most. So if your function is x(t) = e^t, the norm kxk∞ will be e since e^tassumes the greatest value at t = 1.

Let us now give the definition of what is meant by a continuous function between normed spaces as well as a continuous linear transformation.

Definition 3. Let X and Y be normed spaces over R and x0 ∈ X. A map f: X −→ Y is said to be continuous at x0 if

∀² > 0, ∃δ > 0 such that for all x ∈ X satisfying kx − x0k < δ, one has kf (x) − f (x0)k < ².

The map f: X −→ Y is called continuous if for all x0∈ X, f is continuous at x0.

Definition 4. Let X,Y be vector spaces over R. A map T:X −→ Y is called a linear transformation if it satisfies the following conditions:

L1. For all x1,x2 ∈ X, T (x1+ x2) = T (x1) + T (x2).

L2. For all x ∈ X and all α ∈ R, T (α · x) = α · T (x).

A map that is both continuous and linear is called a continuous linear transformation.

The definition of a continuous function is rather abstract and not very efficient to use when investigating whether a function is continuous or not. The following theorem will however give us an easy method to use when checking for continuity.

Theorem 2. Let X and Y be normed spaces over R. Let T:X −→ Y be a linear transfor- mation. Then the following properties of T are equivalent:

1. T is continuous.

2. T is continuous at 0.

3. There exists a number M such that for all x ∈ X, kT xk ≤ M kxk.

Now we have everything needed to be able to define what is meant by the derivative of a function between two normed spaces.

Definition 5. Let X and Y be normed spaces. Let F:X −→ Y be a map and x0∈ X, then F is said to be differentiable at x₀ if there exists a continuous linear transformation L such that:

∀² > 0, ∃δ > 0 such that ∀x ∈ X \{x0} satisfying k x − x0 k < δ, kF (x) − F (x0) − L(x − x0)k

kx − x0k < ².

The operator L is called the derivative of F at x0. If F is differentiable at every point x ∈ X, then it is simply said to be differentiable.

(12)

Now all that is left is to define what is meant by a convex function, then we have everything needed to be able to use Theorem 1.

Definition 6. Let X be a normed space. A function I: X −→ R is said to be convex if for all x1, x2 ∈ X and α ∈ [0, 1],

I(αx1+ (1 − α)x2) ≤ αI(x1) + (1 − α)I(x2)

Moreover, a function is said to be strictly convex if for all x1, x2 ∈ X such that x16= x2

and α ∈ (0, 1),

I(αx1+ (1 − α)x2) < αI(x1) + (1 − α)I(x2)

Now we are ready to apply Theorem 1 on E(f ) and start solving the problem of finding a minimizer.

3 Calculating the derivative and checking for convexity

3.1 Choosing a proper norm for f

Before we start the calculations of our derivative we need to decide what normed space to use. We know that E is a functional defined from C²[a, b] −→ R. Let x ∈ C²[a, b], we define the norm of x ∈ C²[a, b] as the following one:

kxk := max

t∈[a,b]|x(t)| + max

t∈[a,b]|dx

dt(t)| + max

t∈[a,b]|d²x dt²(t)|

= kxk∞+ kx⁰k∞+ kx⁰⁰k∞

Let us quickly check that this choice of norm is valid. Recall from the definition of the norm the following three conditions:

N1.

This condition holds since kxk = max

t∈[a,b]|x(t)|

| {z }

≥0

+ max

t∈[a,b]|dx dt(t)|

| {z }

≥0

+ max

t∈[a,b]|d²x dt²(t)|

| {z }

≥0

≥ 0

kxk = 0 ⇐⇒ max

t∈[a,b]|x(t)| + max

t∈[a,b]|dx

dt(t)| + max

t∈[a,b]|d²x

dt²(t)| = 0 ⇐⇒ x(t) = 0 since |x(t)|, |dx

dt(t)| and |d²x

dt²(t)| ≥ 0 on the interval [a, b]

N2.

kαxk = kαxk∞+ kαx⁰k∞+ kαx⁰⁰k∞

= |α|kxk∞+ |α|kx⁰k∞+ |α|kx⁰⁰k∞

= |α|(kxk∞+ kx⁰k∞+ kx⁰⁰k∞)

= |α|kxk

(13)

N3.

kx + yk = kx + yk∞+ kx⁰+ y⁰k∞+ kx⁰⁰+ y⁰⁰k∞

≤ kxk∞+ kyk∞+ kx⁰k∞+ ky⁰k∞+ kx⁰⁰k∞+ ky⁰⁰k∞

= kxk + kyk

Thus our choice of norm is valid. Now we may go on and calculate the derivative of E(f ).

3.2 Calculating the derivative of E(f )

We want to differentiate the functional E(f ) =

Z _b

a

(f⁰⁰(x))²dx + λ Xn i=1

(f (xi) − αi)²

with respect to f , where f is a function of a variable x. First of all, let us split this expression into two parts which will be treated separately. This will give us a better overview of the calculations. Let

E1(f ) = Z _b

a

(f⁰⁰(x))²dx

E₂(f ) = λ Xn i=1

(f (x_i) − α_i)²

After we have differentiated the functions E1(f ) and E2(f ) respectively, we simply use the summation rule for derivatives to obtain the derivative of E(f ).

3.2.1 Calculating the derivative of E1

Calculating the derivative of a function of the type we are considering is far more compli- cated than an ordinary function from R −→ R. It is not very clear from the definition of the derivative how one proceeds in order to calculate it. We will give a brief explanation to every step in the calculations to make things more clear.

First, recall the definition of the derivative. This basically means that for each ² > 0 we should be able to produce a δ such that if f satisfies kf − f0k < δ, then

kE1(f ) − E1(f0) − L1(f − f0)k kf − f0k < ²

Thus our L1 must be such that the numerator of the above expression must be small in some sense compared to the denominator. Another way of putting it is to let

E1(f ) − E1(f0) − L1(f − f0) = error ⇐⇒

E1(f ) − E1(f0) = L1(f − f0) + error (1)

where the error is some small bounded term that will be small enough for the derivative expression to hold. Our goal is to express L1as a function of f − f0by studying the left hand

(14)

side of (1) and then simply insert this L1 into to derivative expression and see if it holds up.

Let us now have a closer look at the left hand side of (1):

E1(f ) − E1(f0) = Z _b

a

(f⁰⁰(x))²dx − Z _b

a

(f₀⁰⁰(x))²dx

= Z _b

a

((f⁰⁰(x))²− (f₀⁰⁰(x))²)dx

= Z _b

a

(f⁰⁰(x) − f₀⁰⁰(x))(f⁰⁰(x) + f₀⁰⁰(x))dx

= Z _b

a

(f − f0)⁰⁰(x)(f + f0)⁰⁰(x)dx.

We now have one factor that involves f − f0which is what we wanted. The other factor however, involves f + f0which is not what we wanted. Thus by letting f −→ f0 we get:

E1(f ) − E1(f0) ≈ Z _b

a

(f − f0)⁰⁰(x) · 2f₀⁰⁰(x)dx = L1(f − f0)

We now have a transformation L1 which is our candidate for the derivative. Notice that we have not yet in any way shown that this actually is the derivative, so let us do that now. The first step is to check whether L1 is linear or not. It will be easier to follow these calculations if we express L1 as a function of h, rather than f − f0:

L1(h) = Z _b

a

h⁰⁰(x) · 2f₀⁰⁰(x)dx.

Recall from the previous chapter that the following conditions must be satisfied in order for L1to be linear:

L1.

L1(h1+ h2) = Z _b

a

(h1+ h2)⁰⁰(x) · 2f₀⁰⁰(x)dx

= Z _b

a

(h⁰⁰₁(x) · 2f₀⁰⁰(x) + h⁰⁰₂(x) · 2f₀⁰⁰(x))dx

= Z _b

a

h⁰⁰₁(x) · 2f₀⁰⁰(x)dx + Z _b

a

h⁰⁰₂(x) · 2f₀⁰⁰(x)dx

= L1(h1) + L1(h2) L2.

L1(αh) = Z _b

a

(αh)⁰⁰(x) · 2f₀⁰⁰(x)dx

= Z _b

a

αh⁰⁰(x) · 2f₀⁰⁰(x)dx

= α Z _b

a

h⁰⁰(x) · 2f₀⁰⁰(x)dx

= αL1(h)

(15)

We see that both L1 and L2 are satisfied, thus L1 is a linear transformation.

Next we check whether L1 is a continuous transformation. Since L1 is a linear transformation we may use Theorem 2 to show that it is continuous:

|L1h| = | Z _b

a

h⁰⁰(x) · 2f₀⁰⁰(x)dx|

≤ Z _b

a

|h⁰⁰(x)| · |2f₀⁰⁰(x)|dx

≤ Z _b

a

khk · |2f₀⁰⁰(x)|dx

= khk Z _b

a

|2f₀⁰⁰(x)|dx

| {z }

=M

Notice that f0is some fixed function, thusR_b

a|2f₀⁰⁰(x)|dx is some fixed number. By letting M be equal to this number, all conditions for Theorem 2 are satisfied, thus L1is a continuous transformation.

We have shown that L₁ is indeed a continuous linear transformation. Now we need to insert this L1into the definition of the derivative and see if it satisfies all conditions required.

First, let us approximate the numerator in the definition of the derivative

|E1(f ) − E1(f0) − L1(f − f0)| = | Z _b

a

(f⁰⁰(x))²dx − Z _b

a

(f₀⁰⁰(x))²dx − Z _b

a

(f − f0)⁰⁰(x) · 2f₀⁰⁰(x)dx|

= | Z _b

a

((f⁰⁰(x))²− (f₀⁰⁰(x))²− (f − f0)⁰⁰(x) · 2f₀⁰⁰(x))dx|

= | Z _b

a

(f⁰⁰(x) − f₀⁰⁰(x))(f⁰⁰(x) + f₀⁰⁰(x)) − (f − f0)⁰⁰(x) · 2f₀⁰⁰(x))dx|

= | Z _b

a

(f − f0)⁰⁰(x)(f⁰⁰(x) + f₀⁰⁰(x) − 2f₀⁰⁰(x))dx|

= | Z _b

a

((f − f0)⁰⁰(x))²dx|

= Z _b

a

|(f − f0)⁰⁰(x)|²dx

≤ Z _b

a

kf − f0k²dx

= kf − f0k²(b − a)

By inserting the calculations from above into the definition of the derivative we get the following expression:

|E1(f ) − E1(f0) − L(f − f0)|

kf − f0k ≤ kf − f0k²(b − a)

kf − f0k = kf − f0k(b − a) < δ(b − a)

(16)

The last inequality follows from the fact that kf − fok < δ. By choosing δ = _(b−a)^² we see that our L1 is indeed the derivative of E1 at f0.

3.2.2 Calculating the derivative of E2

Let us now move on with E2 and proceed in the same way as we did with E1.

We start by looking at E2(f ) − E2(f0) to see if we can conclude what the derivative of E2

ought to be.

E2(f ) − E2(f0) = λ Xn i=1

(f (xi) − αi)²− λ Xn i=1

(f0(xi) − αi)²

= λ Xn i=1

¡(f (xi))²− 2f (xi)αi+ (αi)²¢

− λ Xn

i=1

¡(f0(xi))²− 2f0(xi)αi+ (αi)²¢

= λ Xn i=1

¡(f (xi))²− 2f (xi)αi+ (αi)²− ((f0(xi))²− 2f0(xi)αi+ (αi)²)¢

= λ Xn i=1

¡(f (x_i))²− 2f (x_i)α_i+ (α_i)²− (f₀(x_i))²+ 2f₀(x_i)α_i− (α_i)²¢

= λ Xn i=1

¡(f (xi))²− (f0(xi))²+ 2f0(xi)αi− 2f (xi)αi

¢

= λ Xn i=1

¡(f (xi) − f0(xi))(f (xi) + f0(xi)) + 2f0(xi)αi− 2f (xi)αi

¢

= λ Xn i=1

¡(f − f0)(xi)(f + f0)(xi) − 2(f − f0)(xi) · αi

¢

We now see that f − f0occurs twice in E2(f ) − E2(f0) which is the variable we are looking for. We also see that f + f0 occurs which is not what we wanted. Thus by letting f −→ f0

we end up with

E2(f ) − E2(f0) ≈ λ Xn i=1

¡(f − f0)(xi)2f0(xi) − 2(f − f0)(xi) · αi

¢

= 2λ Xn i=1

¡(f − f0)(xi)(f0(xi) − αi)¢

= L2(f − f0)

Let us first rewrite L2 as a transformation of h rather than f − f0, this will make the later calculations easier to follow. Thus we get

L2(h) = 2λ Xn i=1

h(xi)(f0(xi) − αi)

The next step is to check whether L2is linear. Recall from the definition of linearity that the following conditions must be satisfied:

(17)

L1.

L2(h1+ h2) = 2λ Xn i=1

¡(h1+ h2)(xi)f0(xi) − (h1+ h2)(xi)αi

¢

= 2λ Xn i=1

¡h1(xi)f0(xi) + h2(xi)f0(xi) − (h1(xi)αi+ h2(xi)αi)¢

= 2λ Xn i=1

¡h1(xi)f0(xi) − h1(xi)αi+ h2(xi)f0(xi) − h2(xi)αi

¢

= 2λ

³Xⁿ

i=1

(h1(xi)f0(xi) − h1(xi)αi) + Xn i=1

(h2(xi)f0(xi) − h2(xi)αi)

´

= 2λ Xn i=1

(h₁(x_i)f₀(x_i) − h₁(x_i)α_i) + 2λ Xn i=1

(h₂(x_i)f₀(x_i) − h₂(x_i)α_i)

= L2(h1) + L2(h2) L2.

L2(β · h) = 2λ Xn i=1

¡(β · h)(xi)f0(xi) − (β · h)(xi)αi

¢

= 2λ Xn i=1

¡β · h(xi)f0(xi) − β · h(xi)αi

¢

= 2λ Xn i=1

β ·¡

h(xi)f0(xi) − h(xi)αi

¢

= β · 2λ Xn i=1

¡h(xi)f0(xi) − h(xi)αi

¢

= βL2(h)

Both conditions are satisfied, thus L₂ is a linear transformation.

We now investigate the continuity of the function. Because of the linearity of L2(h) we once again are able to use Theorem 2 for this matter.

(18)

|L2h| = |2λ Xn i=1

(h(xi)f0(xi) − h(xi)αi)|

= |2λ| ·¯

¯Xⁿ

i=1

h(xi) · (f0(xi) − αi)¯

¯

≤ |2λ|

Xn i=1

|h(xi) · (f0(xi) − αi)|

≤ |2λ|

Xn i=1

|h(xi)| · |(f0(xi) − αi)|

≤ |2λ|

Xn i=1

khk · |(f₀(x_i) − α_i)|

= khk · n · |2λ|

Xn i=1

|(f0(xi) − αi)|

| {z }

=M

Notice that f0is some fixed function, thus n·|2λ|·P_n

i=1|(f0(xi)−αi)| some fixed number.

By letting this number be equal to M, all conditions for Theorem 2 are satisfied meaning that L2is continuous. We have shown that L2 is a continuous linear transformation. Now it only remains to see if L2really is the derivative of E2(f ). We start off by approximating the numerator of the derivative expression:

|E2(f ) − E2(f0) − L2(f − f0)| =

|λ Xn i=1

(f (xi) − αi)²− λ Xn i=1

(f0(xi) − αi)²− 2λ Xn i=1

(f − f0)(xi)(f0(xi) − αi)| =

|λ| ·

¯¯

¯ Xn i=1

(f (xi) − αi)²− Xn i=1

(f0(xi) − αi)²− 2 Xn i=1

(f − f0)(xi)(f0(xi) − αi)

¯¯

¯

after expanding the quadratics and canceling of the α² term we get

(19)

|λ| ·

¯¯

¯ Xn i=1

¡(f (xi))²− (f0(xi))²+ 2f0(xi)αi− 2f (xi)αi)

− 2 Xn i=1

(f (xi)f0(xi) − (f0(xi))²− f (xi)αi+ f0(xi)αi¢¯¯

¯

=|λ| ·

¯¯

¯ Xn i=1

¡(f (xi))²− (f0(xi))²+ 2f0(xi)αi

− 2f (xi)αi− 2f (xi)f0(xi) + 2(f0(xi))²+ 2f (xi)αi− 2f0(xi)αi¢¯¯

¯

=|λ| ·

¯¯

¯ Xn i=1

¡(f (xi))²+ (f0(xi))²− 2f (xi)f0(xi)¢¯¯

¯

=|λ| ·

¯¯

¯ Xn i=1

¡f (xi) − f0(xi)¢₂¯

¯¯

=|λ| · Xn i=1

|((f − f0)(xi))²|

≤|λ| · Xn i=1

k(f − f0)k²

=|λ|n · k(f − f0)k²

When we insert this into the definition of the derivative we get:

kE2(f ) − E2(f0) − L2(f − f0)k

kf − f0k ≤ nλ · kf − f0k²

kf − f0k = nλ · kf − f0k < nλδ

The last inequality follows from the fact that 0 < kf − f0k < δ. If we now choose δ = _nλ^² we have shown that the derivative of E2at a point f0is given by the continuous linear transformation:

L2(h) = 2λ Xn i=1

h(xi)(f0(xi) − αi)

3.3 Check for convexity of E(f )

Let us now make sure that E(f ) is convex. Notice first that if two functions f (x) and g(x) are convex then so is their sum f (x) + g(x), furthermore if one of the functions f (x) or g(x) is strictly convex, then their sum is strictly convex, this follows directly from the definition of strict convexity. In our case, we are interested in showing that the functional E(f ) is strictly convex, since then, if we can find a function f0 such that E⁰(f0) = 0, the minimizer f0 will be unique. Let E1 and E2 be the same functionals as in previous subsection. We will now show that their sum is strictly convex.

First, let us look at the simple function g(y) = y², y ∈ R. Since g⁰⁰(y) = 2 > 0 we know from ordinary calculus that this function is strictly convex. According to the definition of strict convexity stated in the previous section we also know that: ∀α ∈ (0, 1) and y16= y2we have g(αy1+ (1 − α)y2) < αg(y1) + (1 − α)g(y2). Let us use this knowledge when we look at

(20)

E1. Let α ∈ (0, 1) and f1, f2∈ C²[a, b] be such that f16= f2.

E1(αf1+ (1 − α)f2) = Z _b

a

((αf1(x) + (1 − α)f2(x))⁰⁰)²dx

= Z _b

a

(αf₁⁰⁰(x) + (1 − α)f₂⁰⁰(x))²dx

≤ Z _b

a

(α(f₁⁰⁰(x))²+ (1 − α)(f₂⁰⁰(x))²)dx

= α Z _b

a

(f₁⁰⁰(x))²dx + (1 − α) Z _b

a

(f₂⁰⁰(x))²dx

= αE1(f1) + (1 − α)E1(f2)

(2)

If f₁⁰⁰(x) = f₂⁰⁰(x) for all x ∈ [a, b] then the inequality from the calculations above will be an equality. Otherwise it will be a strict inequality.

Let us now have a look at E2. Consider the following simple function: h(y) = (y − b)², obviously this is strictly convex since h⁰⁰(y) = 2 > 0. Thus ∀β ∈ (0, 1) and y1 6= y2 we have h(y1β + (1 − β)y2) < βh(y1) + (1 − β)h(y2). Let us apply this to E2. Let β ∈ (0, 1), f1, f2∈ C²[a, b] be such that f16= f2.

E2(βf1+ (1 − β)f2) = λ Xn i=1

((βf1+ (1 − β)f2)(xi) − αi)²

= λ Xn i=1

(βf1(xi) + (1 − β)f2(xi) − αi)²

≤ λ Xn i=1

(β(f1(xi) − αi)²+ (1 − β)(f2(xi) − αi)²)

= λ Xn i=1

β(f1(xi) − αi)²+ λ Xn i=1

(1 − β)(f2(xi) − αi)²

= βλ Xn i=1

(f1(xi) − αi)²+ (1 − β)λ Xn i=1

(f2(xi) − αi)²

= βE2(f1) + (1 − β)E2(f2)

(3)

If f1(xi) = f2(xi) for i = 1, . . . , n then the inequality in the calculations above will be an equality. Otherwise it will be a strict inequality.

We will now show that the two equalitys f₁⁰⁰(x) = f₂⁰⁰(x) for all x ∈ [a, b] and f1(xi) = f2(xi) for i = 1, . . . , n cannot both be true at the same time if f16= f2. This means that we will have strict inequality in at least one of the two equations (2) and (3) and thus the sum E1(f ) + E2(f ) = E(f ) is a strictly convex functional.

Suppose f₁⁰⁰(x) = f₂⁰⁰(x) for all x ∈ [a, b] and f1(xi) = f2(xi) for i = 1, . . . , n are both true at the same time, then

(21)

f₁⁰⁰(x) = f₂⁰⁰(x) ⇐⇒ f₁⁰(x) = f₂⁰(x) + c.

Where c is some constant. We now look at two cases.

c = 0 =⇒ f1(x) = f2(x) + d, for some constant d. Since we have the condition f1(xi) = f2(xi), it follows that d = 0. This implies that f1(x) = f2(x) and thus contradicts our choice of f1 and f2. This means that c 6= 0 must hold.

c 6= 0. Suppose c > 0 (c < 0 is shown in a similar way). We have f₁⁰(x) = f₂⁰(x) + c, i.e. f₁⁰(x) > f₂⁰(x) ∀x ∈ [a, b]. Let us use this condition together with f1(xi) = f2(xi) for i = 1, . . . , n. Suppose f1(x1) = f2(x1), since f₁⁰(x) > f₂⁰(x) we have that for a point x2> x1, f1(x2) cannot be equal to f2(x2) and thus contradicts the condition that f1(xi) = f2(xi) for i = 1, . . . , n, and so the two equalitys cannot both hold at the same time. Thus we have shown that at least one of the inequalitys in (2) and (3) must be a strict inequality and it follows that the sum of E1 and E2 is strictly convex.

However, you may have noticed that we assumed that we had at least two points that we were trying to approximate with a function. The argumentation from above does not hold if there is just one point. Consider for example the following case involving only one point x1: Suppose f1(x) = (x − x1) and f2(x) = (1 − d) · (x − x1) then we get f1(x1) = f2(x1) and f₁⁰(x) = f₂⁰(x) + d ⇐⇒ f₁⁰⁰(x) = f₂⁰⁰(x). And thus the two equalitys are satisfied at the same time meaning that E is not strictly convex. One could also notice this by considering that any straight line passing through a single point would be a minimizer of E (since E = 0), thus it has infinitely many minimizers so E is not strictly convex. This is however a somewhat ridiculous case since what would be the point of trying to approximate a function to a single point.

3.4 Summary

In the previous two subsections we calculated the derivative of E1(f ) and E2(f ). Adding these two together we find that the derivative of E(f ) at f0 to be

2 Z _b

a

f⁰⁰(x) · f₀⁰⁰(x)dx + 2λ Xn i=1

f (xi)(f0(xi) − αi).

We have also shown that E(f ) is strictly convex. We now have everything needed to be able to use Theorem 1 and solve the problem of finding a minimizer f0 of E(f ). This will be done in the next section.

4 Finding the minimizer f

₀

of E(f )

By inserting the derivative of E into Theorem 1 we get the following expression:

Z _b

a

h⁰⁰(x) · f₀⁰⁰(x)dx + λ Xn i=1

h(xi)(f0(xi) − αi) = 0 ∀h ∈ C²[a, b]

where f0 is the minimizer of E(f ). Notice that we have changed the general function f to h in order to avoid confusion later on. We have also made the quite obvious assumption

(22)

that f0 must be at least a C¹ function. One can see that by looking at E(f ), which contains a second derivative of f that would not exist if f₀⁰ wouldn’t be continuous.

The above expression must hold for every choice of h ∈ C²[a, b]. By choosing a different function h depending on the situation we will be able to discover several different properties of our minimizer f0 which will eventually lead us to an exact function that must be the minimizer. We start by looking at an arbitrary interval between some xk and xk+1, 1 ≤ k ≤ n − 1, and try to derive some properties about f0in this particular interval. This is a natural way of studying the function f0 since it may consist of different adjoining functions on the interval [a, b].

4.1 Behaviour of f

₀

on the interval between x

_k

and x

_k+1

Since the equation should hold for any choice of h ∈ C²[a, b] we can in particular choose an h(x) with the following properties: h(xi) = 0, h⁰(xi) = 0 for i = 1, . . . , n and h(x) = 0 outside the interval [xk, xk+1].

The sum λP_n

i=1h(xi)(f0(xi) − αi) is now 0 with this choice of h(x). If we use integration by parts on the remaining part of E⁰(f0) we get:

Z _x_k+1

xk

h⁰⁰(x)f₀⁰⁰(x)dx = [h⁰(x)f₀⁰⁰(x)]^x_x^k+1_k − Z _x_k+1

xk

h⁰(x)f₀⁽³⁾(x)dx

= [h⁰(x)f₀⁰⁰(x)]^x_x^k+1_k

| {z }

=0

− [h(x)f₀⁽³⁾(x)]^x_x^k+1_k

| {z }

=0

+ Z _x_k+1

xk

h(x)f₀⁽⁴⁾(x)dx

= Z _x_k+1

xk

h(x)f₀⁽⁴⁾(x)dx

The first two terms in the second to last step are equal to zero since we have chosen a function h(x) that is zero in each xi and also h⁰(x) is zero in each xi. Also notice that we have assumed that f0 is four times differentiable on the interval (xk, xk+1). Later on, in section 4.4, we will show that this assumption is valid, but for now we just assume that it is in fact true.

∵ Z _x_k+1

xk

h(x)f₀⁽⁴⁾(x)dx = 0 We now have three options

1)h(x) = 0 ∀x ∈ [xk, xk+1]

2) h(x)f₀⁽⁴⁾(x) is a function that has the same amount of area over and under the x-axis and the integral is therefore 0 on the interval [xk, xk+1]

3) f₀⁽⁴⁾(x) = 0 ∀x ∈ [xk, xk+1]

We can conclude that option 3) is the correct one since R_x_k+1

xk h(x)f₀⁽⁴⁾(x)dx = 0 must hold for all functions h with just the restriction that h(xi) = 0, h⁰(xi) = 0 for i = 1, . . . , n and h(x) = 0 outside the interval [xk, xk+1](as we assumed). In particular option 1) is incorrect since we are allowed to choose h in any way we want in (xk, xk+1) and especially such that it doesn’t meet the requirement that h(x) = 0 ∀x ∈ [xk, xk+1]. For option 2), consider for example h(x) = f₀⁽⁴⁾(x) on the interval (xk, xk+1) (along with the previous assumptions on h(x)), this is a function that cannot have any area under the x-axis which means that the integral is only satisfied when f₀⁽⁴⁾(x) = 0. So if f₀⁽⁴⁾(x) 6= 0 option 2) will not hold.

EXAMENSARBETEN I MATEMATIK MATEMATISKA INSTITUTIONEN, STOCKHOLMS UNIVERSITET