The Singular Value Decomposition Theorem Saga Samuelsson

(1)

The Singular Value Decomposition Theorem

Saga Samuelsson

Bachelor Thesis, 15hp Bachelor in Mathematics, 180hp

(2)

(3)

Abstract

This essay will present a self-contained exposition of the singular value decomposition theorem for linear transformations. An immediate consequence is the singular value

decomposition for complex matrices.

Sammanfattning

Denna uppsats kommer presentera en självständig exposition av

singulärvärdesuppdelningssatsen för linjära transformationer. En direkt följd är singulärvärdesuppdelning för komplexa matriser.

(4)

(5)

Contents

1. Introduction 1

2. Preliminaries 3

2.1. Vector Spaces 3

2.2. Linear Independence and Basis Vectors 5

2.3. Inner Product Spaces 7

2.4. Linear Transformations, Operators, and Functionals 8

2.5. Range and Kernel 11

2.6. Eigenvalues, Eigenvectors, and Eigendecomposition 13

3. Dual Spaces and Adjoint Transformations 15

3.1. Linear Functionals and Dual Spaces 15

3.2. Adjoint Transformations 17

4. Transformations between Dual Spaces and Properties of Operations 19

4.1. Transpose Transformations 19

4.2. Properties of Operators 22

5. Singular Value Decomposition 27

6. An Application to Image Compression 31

7. Acknowledgements 33

8. References 35

(6)

(7)

1. Introduction

With the ever growing usage of online services, from online grocery shopping to social media use, information about the user increases accordingly. This vast amount of data is being analysed by the social media platforms and interested parties. To analyse data of great quantities, often represented by matrices, singular value decomposition is commonly used. In the paper [8] published in 2013, the authors analyse how people “like” posts on the social media platform Facebook and from this they were able with great accuracy to determine religious views, sexual orientations, political views, and more private information. In recent times, a company, hired by the Republican Party in the United States of America, used users information to sway the way people voted by target advertisment. The methods used for analysing the data were supposedly similar to singular value decomposition, [5].

Singular value decomposition can be used for analysing data but also, for example, to compress images, [7]. A practical example of the latter will be provided in the end of this essay. The aim of this essay is to prove the singular value decomposition theorem.

The discovery of singular value decomposition is contributed to two 19th cen- tury mathematicians whom independently discovered this decomposition. Beltrami published his paper in 1873, [1]. In his derivation Beltrami prematurely used properties of singular values that he had yet to show. Jordan, publishing only a year later, provided a complete proof, [6]. Both Jordan’s and Beltrami’s motivation for developing this decomposition was to facilitate work with bilinear forms. The same motivation drove Sylvester to develop on the theory in 1889, [12]. He provided an iterative algorithm for reducing a quadratic from to a diagonal form, and reflected that a corresponding iteration may be used to diagonalize a bilinear form, [11].

Schmidt developed an infinite-dimensional singular value decomposition. He also studied how to approximate a matrix using some specified rank, in his paper [10].

Though, unlike the previous developers, Schmidt came at the problem from an integral equation point of view rather than from linear algebra. Weyl approached it like Schmidt, and developed the theory regarding singular values of a perturbed matrix, in [13].

Similar to the founders of singular value decomposition, the approach of this essay is from linear algebra. The aim of the essay is the following theorem.

Theorem 5.2 Let (V, h , iV) and (W, h , iW) be finite dimensional inner product spaces and T : V → W a linear transformation. Then there exists an orthonormal basis BV = {v1, v2, . . . , vn} and BW = {w1, w2, . . . , wm}, and unique positive scalars s1 ≥ . . . ≥ sr, for r = rank(T ), such that T (vj) = sjwj if j ≤ r and T (vj) = 0W if j > r.

It claims that there exists two orthogonal bases for the finite dimensional vector spaces, such that the linear transformation can be described in terms of these orthogonal basis vectors and some unique singular value.

(8)

The overview of this essay is as follows. Some general and basic linear algebra is reviewed in Section 2. In Section 3 we introduce some new concepts such as dual spaces and adjoint transformations. These concepts come back in Section 4 when we discuss transformations between dual spaces. Finally, in Section 5, we prove the theorem above, and talk briefly about an application of singular value decomposition of matrices in Section 6.

This essay follows [2]. Suggested further reading is [4] and [3], the first more theoretical oriented and the latter focusing on numerical applications.

(9)

2. Preliminaries

Herein some basic concepts from linear algebra shall be reviewed, though the reader is assumed to already be familiar with linear algebra. Vector spaces and linear independence are fundamental concepts in of linear algebra, and thus they shall be reviewed initially. Thereafter we shall review inner product spaces, since they shall be of great use in the main theorem of this essay. Lastly, a review of eigenvalues and eigenvectors shall conclude the preliminaries.

2.1. Vector Spaces. We begin with some basic definitions.

Definition 2.1. A fieldF is a set that contains at least two distinct elements, e.g 0 and 1, along with two binary operations: addition α :F×F → F, and multiplication β : F × F → F. Addition α(a, b) shall be denoted as a + b and be referred to as the sum of a and b. Multiplication β(a, b) shall be denoted as a · b and shall be referred to as the product of a and b. The two operations must satisfy the following axioms, for a, b, c ∈F:

(i) a + b = b + a;

(ii) a + (b + c) = (a + b) + c;

(iii) there exists an element 0 such that a + 0 = a;

(iv) for every element a, there is an element b such that a + b = 0, sometimes denoted as b = −a;

(v) a · b = b · a;

(vi) (a · b) · c = a · (b · c);

(vii) a ∈F, a · 1 = a;

(viii) for a 6= 0, there exists an element c such that a · c = 1;

(ix) a · (b + c) = a · b + a · c.

For our purpose, we shall mostly consider either R or C as our field. Though some theorems in this essay hold for general fields, in these cases we shall denote the field withF to emphasize the arbitrariness of the field.

Definition 2.2. Let F be any field. Let V a non-empty set equipped with two binary operations, α : V × V → V called addition and β : F × V → V called scalar multiplication. Addition α(u, v) shall be denoted by u + v. Furthermore, the element u + v shall be referred to as the sum of u and v. Multiplication β(c, u) shall be denoted by cu, and is referred to as the scalar multiplication of u by c. For u, v, w ∈ V and a, b ∈F, the set V is said to be a vector space over the field F if the following axioms are satisfied:

(i) if u, v ∈ V , then u + v ∈ V ; (ii) u + v = v + u;

(iii) u + (v + w) = (u + v) + w;

(10)

(iv) there exists an element 0 ∈ V such that u + 0 = u;

(v) for every element u there exists an element, denoted −u, such that u+(−u) = 0;

(vi) if a ∈F, then au ∈ V ; (vii) a(u + v) = au + av;

(viii) (a + b)u = au + bu;

(ix) (ab)u = a(bu);

(x) 1u = u.

We shall also have use of subspaces.

Definition 2.3. Let V be a vector space over a fieldF, and let W ⊆ V be a subset.

If W is a vector spaces under the same addition and scalar multiplication as in V , then W is called a subspace of V .

Theorem 2.4. Let V be a vector space over a fieldF, and let W be a set of vectors from V . The set W is a subspace of V if, and only if, the following conditions are satisfied:

(i) if u, v ∈ W , then u + v ∈ W ; (ii) if k ∈F and u ∈ W , then ku ∈ W , for all u, v ∈ V , and all k ∈F.

Proof. Assume that W is a subspace of V , then all the vector space axioms are fulfilled for W , especially closure under addition and scalar multiplication. Assume that addition and scalar multiplication are closed, then W at least satisfy axiom (i) and (v) of Definition 2.2. Axiom (ii),(iii),(vii),(viii),(ix) and (x) are inherited from V . It remains to show that axiom (iv) and (v) are satisfied. The vector space W is assumed to be closed under scalar multiplication for any scalar k, if we let w ∈ W , then kw ∈ W . Especially for 0w = 0, and (−1)w = −w and with that w + (−w) = 0. Thereupon W satisfy axiom (iv) and (v) Until now, we have properly defined vector spaces, though, in order to obtain a proper intuition of what they actually are, or how to visualize them, we will study a few examples.

Example 2.5. For V = F the vector field axioms are just a subset of the field axioms, i.e. V =F is a vector space over the field F.

A less trivial example follows below.

Example 2.6. Let V =Fⁿ be the set of all n-tuples of numbers, and let us define the operations on V to be the usual addition and scalar multiplication. Let u, v ∈ V , and k be some scalar inF, then

u + v = (u1, u2, . . . , un) + (v1, v2, . . . , vn)

= (u₁+ v₁, u₂+ v₂, . . . , u_n+ v₂), ku = (ku₁, ku₂, . . . , ku_n).

(11)

The set V along with these operations is then a vector space over the fieldF. In the vector spaceR², vectors are geometrically interpreted as points or arrows.

Initially, it is often useful to think of a vector as an element ofFⁿ. Though observe that a vector space may contain other types of objects, as the following example illustrates.

Example 2.7. Let Mn×m(F) denote the collection of all m × n -matrices with entries inF. With the usual definition of matrix addition and scalar multiplication,

Mmn(F) is a vector space.

2.2. Linear Independence and Basis Vectors. We shall review the concepts of a basis of a vector space and its corresponding dimension.

Example 2.6 illustrated thatFⁿ, with standard vector addition and scalar multiplication, is a vector field. In the following example, we letFⁿ =R².

Example 2.8. Let R² be our vector space with the normal definition of vector addition and multiplication by a scalar. Let u = (2, 4), v = (1, 2), and w = (1, 0).

Note how u can be written in terms of v, i.e., u = 2v. By way of contrast, u cannot

expressed in terms of w.

As a result of this observation in Example 2.8 we have the need for a formal definition of the concept of a vector being expressible in terms of other vectors is required, and thus follows the formal definition of linear dependence.

Definition 2.9. A finite set of vectors {v1, v₂, . . . , v_k} from some vector space V is said to be linearly dependent if there are scalars c₁, c₂, . . . , c_k, not all zero, such that c₁v₁+ c₂v₂+ . . . + c_kv_k = 0. The set {v₁, v₂, . . . , v_k} is said to be linearly independent if c₁v₂+ c₂v₂+ . . . + c_kv_k = 0 only has the trivial solution c₁= c₂= . . . = c_k = 0.

Returning to Example 2.8, note that the reason that u could be written in terms of v is because they are linearly dependent, while v and w are linearly independent.

Also, this essay has use of the closely related concept of span.

Definition 2.10. Let S = {v₁, v₂, . . . , v_k} be a set of vectors in V . The set of all linear combinations of these vectors {v₁, v₂, . . . , v_k} is called the span of the set S, and denoted as span(v₁, v₂, . . . , v_k). If V = span(v₁, v₂, . . . , v_k), then we say that {v₁, v₂, . . . , vk} spans V .

From Definition 2.9 and Definitions 2.10 it follows that we can define a basis.

Definition 2.11. A basis, B, of a non-zero vector space V , is a subset of the given vector space, such that the vectors in the basis are linearly independent and span(B) = V .

The dimension of the vector space is defined as the minimum number of basis vectors required to span the whole vector space, we will denote the dimension of a vector space as dim(V ).

In this essay finite dimensional vector spaces are of particular interest. However, unless explicitly stated, definition and theorems still hold for a vector space of

(12)

infinite dimension. To illustrate further, an example of an infinite dimensional vector space is as follows.

Example 2.12. Let V consist of all elements of the from v = (v1, v2, . . . , vn, . . . ), for which vi ∈ R, i = 1, 2, . . . , n, . . . . Two vectors are said to be equal if their corresponding components are equal. Addition and scalar multiplication for vectors u, v ∈ V is as follows,

u + v = (u₁, u₂, . . . , u_n, . . . ) + (v₁, v₂, . . . , v_n, . . . )

= (u1+ v1, u2+ v2, . . . , un+ vn, . . . ) ku = k(u1, u2, . . . , un, . . . ) = (ku1, ku2, . . . , kun, . . . ).

This vector space is often referred to as V =R^∞. It is important for the main theorem to construct a basis given a linearly independent set.

Theorem 2.13. Let S = {v₁, v₂, . . . , v_k} be a linearly independent set of vectors in a vector space V . Then, the following statements hold:

i) if v /∈ span(S), then a linearly independent set is obtained by adjoining v to S, i.e., {v₁, v₂, . . . , v_k, v} is a linearly independent set;

ii) any vector v ∈ span(S) can be expressed in an unique way as linear combina- tions of v1, v2, . . . , vk.

Proof. i) Assume the contrary, that is, that v1, v2, . . . , vk, v are linearly dependant.

Then there are scalars c2, c2, . . . , ck, c, which all cannot be zero, such that c1v1+ c2v2+ . . . + ckvk+ cv = 0.

Suppose that c = 0, then some cj 6= 0. By Definition 2.9 there is a dependence between v₁, v₂, . . . , v_k, which is a contradiction to the first assumption. Thus, c 6= 0. We have that cv may be expressed as

cv = (−c1)v1+ (−c2)v2+ . . . + (−ck)vk, i.e.

v =

−c1

c

v1+

−c2

c

v2+ . . . +

−ck

c

vk,

and therefore v ∈ span(S), which is also contrary to the assumption. Hence, v₁, v₂, . . . , v_k, v is a linearly independent.

ii) Suppose that v ∈ span(S) and that,

v = a1v1+ a2v2+ . . . + akvk = b1v1+ b2v2+ . . . + bkvk,

ehere ai, biare constants. Subtracting the second expression from the first and then rearranging them, the following expression is obtained

(a₁− b1)v₁+ (a₂− b2)v₂+ . . . + (a_k− bk)v_k = 0.

Since {v1, v2, . . . , vk} is linearly independent, thus we have that a₁− b₁= a₂− b₂= . . . = a_k− b_k = 0,

from which we can conclude that a1= b1, a2= b2, . . . , ak= bk.

(13)

Corollary 2.14 shall be useful to us in our future endeavours, as it is integral in several of the proofs in this essay.

Corollary 2.14. Let V be a non-empty, finite dimensional vector space V over a field F. If S = {v¹, v2, . . . , vk}, for k ≤ n where n = dim(V ), is linearly independent then S can be extended to a basis BV = {v1, v2, . . . , vk, . . . , vn}.

Proof. If S spans V then we are done, otherwise there is some v /∈ span(V ). By Theorem 2.15 we can adjoining {v} to S, and it will still be linearly independent. If this set spans V we are done, otherwise repeat the argument above to add elements.

Since dim(V ) is finite, this iteration will terminate, and it will result in a linearly

independent set which will span V .

Further development of the concept is at the core of Theorem 2.15.

Theorem 2.15. A set B = {v1, v₂, . . . , v_k} in a vector space V is a basis of V if, and only if, for each vector v ∈ V , there are unique scalars c₁, c₂, . . . , c_k such that v = c₁v₂+ c₂v₂+ . . . + c_kv_k.

Proof. Suppose that B is a basis for V and v ∈ V , then we wish to show that there are unique scalars. Since span(B) = V , there are scalars c1, c₂, . . . , c_k such that v = c₁v₁+ c₂v₂+ . . . + c_kv_k. By Theorem 2.15 of linear independence of vectors, these scalars are unique.

Assume then that for every v there are unique scalars c₁, c₂, . . . , c_k such that c₁v₁+ c₂v₂+ . . . + c_kv_k = v. We wish to show that this implies that B is a basis.

Since every vector in V is assumed to be expressible as a linear combination of {v1, v2, . . . , vk} if follows by definition that span(B) = V . Now it only remains to show that span(B\{vi}) 6= V for any i = 1, 2, . . . , k. There are unique scalars c1, c2, . . . , ck such that

c1v1+ c2v2+ . . . + ckvk = v.

Take v = 0, then there should be unique scalars c₁, c₂, . . . , c_k such that c₁v₁+ c₂v₂+ . . . + c_kv_k = 0.

However, 0 = 0v1+ 0v2+ . . . + 0vk, and by the assumption of uniqueness of the scalars, ci = 0 for all i = 1, 2, . . . , k. Thus, the trivial solution is the only solution and therefore B is linearly independent. Since B is linearly independent we may not take away any vector from B, for then B will no longer span V . It follows that B is

a basis of V .

2.3. Inner Product Spaces. In this section different types of structures for vector spaces are reviewed.

Definition 2.16. Let V be a vector space over a fieldF ∈ {R, C}. An inner product on V is a function, h , i : V × V →F, which satisfies:

(i) hu, ui ≥ 0, and equal if, and only if, u = 0;

(ii) hu + v, wi = hu, wi + hv, wi;

(iii) hγu, vi = γhu, vi;

(14)

(iv) hu, vi = hv, ui.

Where v denotes the complex conjugate of the vector v.

By an inner product space, we mean the pair (V, h , i) consisting of the real or complex vector space V and an inner product h , i on V . If hu, vi = 0, then we say that u, v are orthogonal, and that the norm of a vector v is kvk = phv, vi, for u, v ∈ V , where V is an inner product space. If the basis of an inner product space consists of orthogonal vectors, then it is a called an orthogonal basis. Furthermore, if all basis vectors are of unit length, i.e, kvk = 1, then it is referred to as an orthonormal basis.

A classic example of an inner product is the dot product.

Example 2.17. Let R³ be our vector space with the inner product defined as hu, vi = u · v = (u1, u2, u3) · (v1, v2, v3) = u1v1+ u2v2+ u3v3, for u, v ∈R³. Then R³ along with this definition of inner product, is an inner product space. The following example will be on the same vector space as in the previous example, but we will give equip it with a different inner product.

Example 2.18. Let R³ be our vector space with the inner product defined as hu, vi = h(u1, u2, u3), (v1, v2, v3)i = 2u1v1+ u2v2+ πu3v3, for u, v ∈R³. ThenR³ along with this inner product, is an inner product space. In the two examples above we started with the same vector space, but equipped them with different inner products, which resulted in different inner product space.

2.4. Linear Transformations, Operators, and Functionals. The main object of interest in this section is a special class of functions between vector spaces, the so called linear transformations.

Definition 2.19. Let V and W be vector spaces, over a fieldF. A linear transfor- mation T : V → W is a function that satisfies the following:

(i) for every v1, v2∈ V , T (v1+ v2) = T (v1) + T (v2);

(ii) for every v ∈ V and every scalar c ∈F, T (cv) = cT (v).

Note that the defining properties of vector spaces, addition and multiplication with scalars, is preserved by linear transformations. In this sense linear transformations are the functions that respect the structure of vector spaces.

For some special cases of the co-domain W , linear transformations go under other names. A linear transformation of the type T : V → V is called a linear operator.

Recall that a fieldF is a vector space. A linear transformation T : V → F is called a linear functional.

Two geometrically important linear operator are the contraction and the dilution transformation.

Example 2.20. Let V be a vector space over a fieldF. Then the linear operator T : V → V be given by T (v) = kv for all v ∈ V . If 0 < k < 1, then T is a

contraction. If k > 1, then it is a dilution.

(15)

Recall that in Example 2.7 we saw that Mn×m(F) is a vector space. In Exam- ple 2.21 we shall see that there are linear transformations on matrices.

Example 2.21. Let M_nnbe the vector space of all n × n matrices, and the linear transformation T : M_nn→ M_nn be defined as T (A) = A^tr, i.e. the transformation maps A to its transpose. Recall that the transpose of a matrix is the matrix ob- tained by interchanging the rows and columns. This is a linear transformation since (A + B)^tr = A^tr + B^tr and (kA)^tr = kA^tr, for some matrix A, B and scalar

k ∈ {C, R}.

Continuing to study the relation between linear transformations and matrices, suppose that BV = {v1, v2, . . . , vm} and BW = {w1, w2. . . , wn} each are bases for the vector spaces V and W respectively. Let T : V → W be a linear transformation.

One may view the linear transformation T as a set of numbers a_ij such that, T vj=

m

X

i=1

aijwi, 1 ≤ j ≤ n.

Moreover, if v = c₁v₁+ c₂v₂+ . . . + c_mv_m, then the summation representation of its mapping into W is as follows,

T (v) =

n

X

j=1 m

X

i=1

aijcj

! wi.

Familiarly, the matrix representation of the linear transformation is then

[T ] =







a11 a12 . . . a1m

a₂₁ a₂₂ . . . a_2m ... ... . .. ... a_n1 a_n2 . . . a_nm





 .

Remark 2.22. Given two bases, linear transformations can always be represented by matrices, see[9].

The following theorem is about linear transformations.

Theorem 2.23. Let T : V → W be a function. Then T is a linear transformation if, and only if, for every pair of vectors v1, v2∈ V and scalars c1, c2∈ F,

T (c1v1+ c2v2) = c1T (v1) + c2T (v2). (2.1) Proof. Assume that T is a linear transformation and let v1, v₂ ∈ V , c1, c₂ ∈ F.

Then

T (c₁v₁+ c₂v₂) = T (c₁v₁) + T (c₂v₂)

by the first property of Definition 2.19. From the second property it follows that T (c₁v₁) = c₁T (v₁), T (c₂v₂) = c₂T (v₂), and thus

T (c1v1+ c2v2) = c1T (v1) + c2T (v2).

Suppose that T satisfies the given criteria, (2.1). Then let v1, v2 ∈ V , and let c1 = c2 = 1, then we have T (v1+ v2) = T (v1) + T (v2), which satisfies the first property of the definition for a linear transformation. Let v1 = v, v2 = 0, c1 = c

(16)

and c2= 0, which gives T (cv) = cT (v), i.e the second property. It follows that T

is a linear transformation.

Note that we may define addition and multiplication by a scalar for linear trans- formations. Let S, T be linear transformations from the vector space V to W , and k ∈ F. Define addition of two linear transformations as

(T + S)(v) =T (v) + S(v) T (kv) = kT (v).

Thus, linear transformations are closed under addition and multiplication by a scalar.

In Definition 2.24 we will discuss another important concept, namely the zero map.

Definition 2.24. Let V and W be vector spaces. For all v ∈ V , define T (v) = 0W. This is the zero map from V to W , where 0W is the zero vector in W . The zero map is denoted 0V →W.

From Theorem 2.23 we shall, in Corollary 2.25, prove some elementary properties of linear transformations.

Corollary 2.25. Let V and W be vector spaces over a fieldF, and let T : V → W be a linear transformation. Then the following holds:

(i) T (u − v) = T (u) − T (v);

(ii) T (0_V) = 0_W.

Proof. i) By Theorem 2.23 we have,

T (u − v) = T (u + (−1)v) = T (u) + T ((−1)v) = T (u) + (−1)T (v) = T (u) − T (v).

ii) Using the result proved in i), and 0V = 0V − 0V, we get that, T (0V) = T (0V − 0V) = T (0V) − T (0V) = 0W.

With Corollary 2.25 in mind, the formal definition of the set of all linear transformations is defined.

Definition 2.26. The set L(V, W ) consists of all linear transformations T : V → W . This set L(V, W ), together with the following definitions of addition and scalar multiplication is a vector space. For S, T ∈ L(V, W ), define (S + T ) : V → W , and for T ∈ L(V, W ) and c ∈F, the transformation cT : V → W . That is,

(S + T )(v) = S(v) + T (v), (cT )(v) = cT (v),

and note that from Definition 2.24 we have that the zero map is an element of L(V, W ), and thus non-empty. Therefore L(V, W ) is a vector space.

(17)

2.5. Range and Kernel. Every linear transformation T : V → W , for V and W vector spaces over a fieldF, yields two new vector spaces, the so called range and kernel of a transformation, these shall be the concepts focused on in this section.

Definition 2.27. Let V, W be vector spaces over a fieldF, and let T : V → W be a linear transformation. The range of T is defined as the image of the vectors from V into W . Formally, the range is denoted,

range(T ) = {w ∈ W : w = T (v) f or some v ∈ V }.

The dimension of the range of T is called the rank of T , rank(T ) = dim(range(T )).

Definition 2.28. Let V and W be vector spaces over the fieldF, and let T : V → W be a linear transformation. The kernel of T is defined as the vectors from V that are mapped to the zero vector in W . Formally, denoting the kernel as

ker(T ) = {v ∈ V : T (v) = 0W}.

The dimension of the kernel of T is nullity of T , and is denoted nullity(T ) = dim(ker(T )).

Recall from Theorem 2.4 that a subset of vectors from a vector space is a subspace if, and only if, it is closed under addition and scalar multiplication. This shall be the idea of proof for Theorem 2.29 and Theorem 2.31.

Theorem 2.29. Let V, W be vector spaces over a fieldF, and let T : V → W be a linear transformation. Then range(T ) is a subspace of W .

Proof. Let the assumption be as stated in the theorem. By Theorem 2.4, it suffices to show that range(T ) is closed under addition and scalar multiplication.

Suppose that w₁, w₂∈ range(T ), and c1, c₂∈ F. Reflect on what it means to be a vector in range(T ), that is, a vector w ∈ range(T ) if there is a v ∈ V such that T (v) = w. Since w₁, w₂ are assumed to be in range(T ) there are v₁, v₂∈ V such that T (v₁) = w₁, T (v₂) = w₂. Since V is a vector space and v₁, v₂∈ V and c₁, c₂ are still scalars, it follows then that c₁v₁+ c₂v₂∈ V . Now,

T (c₁v₁+ c₂v₂) = c₁T (v₁) + c₂T (v₂) = c₁v₁+ c₂w₂,

by Theorem 2.23. So c₁w1+ c₂w2 is the image of c₁v1+ c₂v2, and thus belongs to range(T ). The range of T is closed under scalar multiplication and vector addition,

therefore it is a subspace of W .

Similarly, that the kernel of a linear transformation is a subspace.

Theorem 2.30. Let V, W be vector spaces overF, and let T : V → W be a linear transformation. Then ker(T ) is a subspace of V .

Proof. Let the assumption be as stated in the theorem. By Theorem 2.4, it suffices to show that ker(T ) is closed under addition and scalar multiplication.

Suppose that v1, v2 ∈ ker(T ), and c1, c2 are scalars. Since v1, v2 ∈ ker(T ), we have T (v1) = T (v2) = 0W. We wish to show that ker(T ) is a subspace, and since it consists of vectors from another vector space, V , we only need to show that ker(T ) is

(18)

closed under vector addition and scalar multiplication. Applying T to c1v1+ c2v2, i.e.

T (c1v1+ c2v2) = T (c1v1) + T (c2v2) = c10W + c20W = 0W,

so c1v1+ c2v2∈ ker(T ) as required, and thus ker(T ) is a subspace of V . The following theorem establish a relation between the nullity and rank of a linear transformation.

Theorem 2.31. Let V be an n-dimensional vector space and W a finite dimensional vector space. Let T : V → W be a linear transformation. Then n = rank(T ) + nullity(T ).

Proof. Let k = nullity(T ). Choose a basis {v1, v2, . . . , vk} for ker(T ). Extend this basis to {v1, v₂, . . . , v_n} for V .

If (i) {T (v_k+1), . . . , T (v_n)} is linearly independent and (ii) {T (v_k+1), . . . , T (v_n)}

spans range(T ), then the result will follow, since {T (v_k+1), T (v_k+2), . . . , T (v_n)}

will then be a basis of range(T ). Its dimension is rank(T ) = n − k as desired, since k = nullity(T ). So we will proceed to show these two statements.

(i) Firstly we will show that

span(v1, v₂, . . . , v_k) ∩ span(vk+1, v_k+2, . . . , v_n) = {0V}.

Since {v1, v2, . . . , vn} is a basis, it is linearly independent. Suppose that c1v1+ c2v2+ . . . + ckvk = ck+1vk+1+ ck+2vk+2+ . . . + cnvn

is a vector in ker(T ) ∩ range(T ). It follows that,

c1v2+ c2v2+ . . . + ckvk− ck+1vk+1− ck+2vk+1− . . . − cnvn= 0V. Since (v1, v2, . . . , vn) is a basis it will only have the trivial solution to the previous equation, that is, c1= c2= . . . = ck= . . . = cn= 0, and thus c1v1+. . .+ckvk= 0V, as claimed.

Suppose now that

c_k+1T (v_k+1) + . . . + c_nT (v_n) = 0_W.

Since ck+1T (v_k+1)+. . .+cnT (v_n) is the image of u = ck+1vk+1+. . .+cnvn, the vec- tor u is in the kernel of T . Then c_k+1vk+1+. . .+c_nvnis in the span(v₁, v₂, . . . , v_k), and so in the intersection span(v₁, v₂, . . . , v_k) ∩ span(v_k+1, v_k+2, . . . , v_n), which we have just shown to be the trivial subspace {0_V}. Therefore,

ck+1vk+1+ ck+2vk+2+ . . . + cnvn= 0V,

and since (vk+1, vk+2, . . . , vn) are linearly independent it follows that ck+1= ck+2= . . . = cn= 0. Thus, the set {ck+1T (vk+1), ck+2T (vk+2, . . . , cnT (vn)}

is linearly independent, just as we wished to show.

(ii) Since every vector in V is a linear combination of v1, v2, . . . , vn it follows that any vector in range(T ) is

T (c1v1+ . . . + cnvn) = c1T (v1) + . . . + ckT (vk) + ck+1T (vk+1) + . . . + cnT (vn).

However, since v₁, v₂, . . . , v_k∈ ker(T ), this means that

T (c1v1+ . . . + cnvn) = ck+1T (vk+1) + ck+2T (vk+2) + . . . + cnT (vn),

(19)

which is just an element of span(T (vk+1), T (vk+2), . . . , T (vn)) as required. While on the topic of linear transformations, we shall look closer at an important linear operator, namely the identity operator.

Definition 2.32. Let V be a vector space. Define IV : V → V by IV(v) = v for all v ∈ V . This is the identity map on V .

2.6. Eigenvalues, Eigenvectors, and Eigendecomposition. Eigenvalues and eigenvectors will be introduced in this section, and how they can be used to decom- pose a matrix so that it is described by a diagonal matrix and matrices consisting of eigenvectors.

Definition 2.33. Let T be a linear operator on a vector space V over F, i.e. a linear transformation T : V → V . A vector v ∈ V is said to be an eigenvector of T with eigenvalue λ ∈F if T (v) = λv.

For the matrix representation of a transformation, there is the following corresponding definition.

Definition 2.34. Let A be an n × n matrix with entries in the field F. Then an eigenvector of A is a vector x such that Ax = λx, for some scalar λ ∈F. The scalar λ is the eigenvalue of the matrix A.

Remark 2.35. One can prove that for a linear transformation T : V → W , and V, W vector spaces, note that the eigenvectors, corresponding to non-zero eigenvalues, of T will span the range of T .

For example, let the matrix A be defined as follows,

A =1 1 0 0

.

The eigenvaules of A are λ1= 1 and λ2= 0 and the corresponding eigenvectors are x₁ = (1, 0) and x₂ = (−1, 1). Normalizing the two eigenvectors we obtain u₁= (1, 0) and u₂= (^√¹

2,^√¹

2).

These eigenvalues and eigenvectors can now be used to express A. Traditionally we combine the eigenvectors of A into a matrix, call it U and the eigenvalues of A of a diagonal matrix, call it Λ. We may now rewrite Ax = λx as AU = U Λ, or equivalently A = U ΛU⁻¹. Here, that would be

A =

"

1 ⁻¹^√₂ 0 ^√¹

2

#1 0 0 0

1 1

0 √

2

To describe the matrix A in terms of a diagonal matrix and matrices consisting its eigenvectors (and its inverse) is called Eigendecomposition, sometimes spectral decomposition. We summarize this in the following definition.

Definition 2.36. If A is a symmetric n × n matrix that is diagonalized by U = [u1u2. . . un] where ui is the the normalized eigenvectors and λi the corre- sponding eigenvalues. Then we know that Λ = U⁻¹AU is the diagonal matrix with

(20)

eigenvalues in the diagonal. Equivalently, we may express A as A = U ΛU⁻¹. This is called the eigendecomposition or spectral decomposition of A.

How Singular Value Decomposition and Eigendecomposition are related, yet different, will be discussed in Section 5. This concludes the preliminaries, and with these concepts now revisited we shall move on to more sophisticated theory.

(21)

3. Dual Spaces and Adjoint Transformations

In this section we shall being by studying linear functionals and duals, and then proceed to discuss adjoint transformations.

3.1. Linear Functionals and Dual Spaces. Here, we shall focus on a type of linear transformation of special interest are those from a vector space V to their underlying fieldF. Recall that if V and W are vector spaces, then L(V, W ) is the vector space of all the linear transformations from V to W .

Definition 3.1. Let V be a finite dimensional vector space over a fieldF. Then L(V, F) is called the dual space of V , denoted V⁰. The elements of V⁰ are called linear functionals.

Now follows an illustrating example of a linear functional as an inner product.

Example 3.2. Let V be in inner product space and let u ∈ V . Define f : V →F by

f (v) = hu, vi,

then f is a linear functional. The linearity follows frim the properties of inner

products.

Given a basis for a vector space V , we may construct an associate basis for the dual space. Constructing this basis is the purpose of the following lemma.

Lemma 3.3. Let V be a vector space over a fieldF with basis B = {v¹, v2, . . . , vn}.

Then there exists linear functionals f1, f2. . . , fn ∈ V⁰ such that fj(vj) =

(1, if i = j,

0, otherwise. (3.1)

Furthermore, B⁰ = {f1, f₂, . . . , f_n} is a basis for V⁰. Proof. Let the assumptions be as stated in the lemma. Let v = a1v1+ a2v2+ . . . + anvn∈ V , and let

fj(v) = fj(a1v1+ a2v2+ . . . + anvn) = aj

for each j = 1, 2, . . . , n. Then f_j(v_j) = 1 and f_j(v_i) = 0 if i 6= j, so these mappings exist.

Now, to show that the mapping fj is linear. By showing that the mapping is closed under vector addition. Let u = a1v1+ a2v2+ . . . + anvn, and

v = b1v1+ b2v2+ . . . + bnvn, and u, v ∈ V then,

fj(u + v) = fj(a1v1+ a2v2+ . . . + anvn+ b1v1+ b2v2+ . . . + bnvn)

= aj+ bj = fj(u) + fj(v).

To show that it is closed under scalar multiplication, for v = a₁v₁+a₂v₂+. . .+a_nv_n in V , we have,

fj(kv) = f (k(a1v1+ a2v2+ . . . + anvn)) = kaj = kfj(v),

(22)

for any k. Consequently, fj is linear. The next step is to show that the set {f1, f2, . . . , fn} is linearly independent.

Assume that f =Pn

j=1c_jf_j = 0_{V →F}. Then f (v) = 0 for every v ∈ V , especially for f (vj) = cj= 0 for all j, and thus the only solution is the trivial solution, i.e the set {f1, f2, . . . , fn} is linearly independent.

All that remains to show is that the basis consisting of the functionals

{f1, f₂, . . . , f_n} span V⁰. Let v = a1v1+ a2v2+ . . . + anvn ∈ V , and assume that g ∈ V⁰. Let c_j = g(v_j) and let f =Pn

j=1c_jf_j. Then f_j(v_j) = c_jf_j(v_j) = c_j = g(v_j) for all vj. Since both f and g are linear we have,

f (v) = f (a1v1+ a2v2+ . . . + anvn) = a1f (v1) + a2f (v2) + . . . + anf (vn)

= a₁g(v₁) + a₂g(v₂) + . . . + a_ng(v_n) = g(a₁v₁+ a₂v₂+ . . . + a_nv_n) = g(v).

So f (v) = g(v) for all v ∈ V , therefore f = g. Since g was arbitrarily chosen, and it was shown that it could be written as a linear combination of fj, we have that span(B⁰) = span(f1, f2, . . . , fn) = V⁰.

From Lemma 3.1 we can formulate the following definition.

Definition 3.4. Let V be a vector space with basis B = {v1, v2, . . . , vn}. The basis B⁰= {f1, f2, . . . , fn} of V⁰ such that (3.1) holds, i.e.

fj(vj) =

(1, if i = j, 0, otherwise.

is called the dual basis to B.

As we saw in Example 3.2, if V is an inner product space every element of V gives rise to a linear functional in V⁰. As we will see later, in Theorem 3.5, the opposite is also true, if V is finite dimensional.

Theorem 3.5. Let (V, h , i) be a finite dimensional inner product space over F ∈ {R, C}, and assume that f ∈ V⁰. Then there exists an unique vector v ∈ V such that f (u) = hu, vi for all u ∈ V .

Proof. Let S = {v1, v2, . . . , vn} be an orthonormal basis for V , and assume that f (vi) = ai, i = 1, 2, . . . , n. Set v = a1v1+ a2v2+ . . . + anvn. We claim that f (u) = hu, vi for all u ∈ V .

Suppose that u = b1v1+b2v2+. . .+bnvn∈ V . The left hand side of f (u) = hu, vi is

f (u) = f (b1v1+ b2v2+ . . . + bnvn) = b1f (v1) + b2f (v2) + . . . + bnf (vn)

= b1a1+ b2a2+ . . . + bnan.

(23)

The last step follows since by definition f (vi) = ai. The right hand side is, hu, vi = hb₁v₁+ b₂v₂+ . . . + b_nv_n, a₁v₁+ a₂v₂+ . . . + a_nv_ni

=

n

X

i=1 n

X

j=1

hbivi, ajvji =

n

X

i=1 n

X

j=1

bihvi, ajvji =

n

X

i=1 n

X

j=1

bihaivi, vji

=

n

X

i=1 n

X

j=1

biajhvi, vji =

n

X

i=1 n

X

j=1

biajhvi, vji = b1a1+ b2a2+ . . . + bnan, which proves the existence of v. It remains to so show that v is unique.

Suppose that hu, vi = f (u) for all u ∈ V . Also, assume that hu, xi = f (u) for some x ∈ V . We wish to show that x = v, i.e. hu, xi = hu, vi. In other words, hu, xi−hu, vi = 0, which gives hu, x −vi = 0. Let u = x −v, then hx −v, x −vi = 0 which is equivalent to x − v = 0. From this, x = v, and with that uniqueness have

been shown.

3.2. Adjoint Transformations. Herein, we will focus on implications of function- als.

Let V, W be two vector spaces and T ∈ L(V, W ). We want to define an associated linear transformation T^∗ ∈ L(W, V ). For this purpose let w ∈ W and define a linear functional f ∈ V⁰ by f (v) = hT (v), wi_W. By Theorem 3.5 there is an unique element v⁰∈ V such that f (v) = hv, v⁰i_V. For every w we obtain some v⁰∈ V . Let T^∗(w) = v⁰ then this defines a linear transformation T^∗: W → V such that for all v ∈ V , w ∈ W we have

hT (v), wiW = hv, T^∗(w)i_V. (3.2) This will be referred to as the fundamental equation defining the adjoint transfor- mation.

Definition 3.6. Let (V, h i_V) and (W, h i_W) be finite dimensional inner product spaces and let T ∈ L(V, W ). The map T^∗∈ L(W, V ) is called the adjoint transfor- mation of T . It is the unique linear map from W to V satisfying equation (3.2).

To conclude this section, the following theorem establish some properties of the map T 7→ T^∗, from L(V, W ) to L(W, V ).

Theorem 3.7. Let (V, h , iV), (W, h , iW) and (X, h , iX) be finite inner product spaces over the fieldF ∈ {R, C}. Then the following holds:

(i) if S, T ∈ L(V, W ), then (S + T )^∗= S^∗+ T^∗; (ii) if T ∈ L(V, W ) and γ ∈F, then (γT )^∗= γT^∗;

(iii) if S ∈ L(V, W ) and T ∈ L(W, X), then (T S)^∗= S^∗T^∗; (iv) if T ∈ L(V, W ), then (T^∗)^∗= T ;

(v) I_V^∗ = IV.

Proof. (i) Let v ∈ V , w ∈ W . By Definition 2.26 we obtain, h(S + T )(v), wiW = hS(v) + T (v), wiW,

(24)

which, by Definition 2.16, is

hS(v), wi_W + hT (v), wi_W = hv, S^∗(w)i_V + hv, T^∗(w)i_V = hv, S^∗(w) + T^∗(w)i_V

= hv, (S^∗+ T^∗)(w)iV. Conversely, by Definition 3.6 we have,

h(S + T )(v), wiW = hv, (S + T )^∗(w)iV.

So, (S + T )^∗(w) = S^∗(w) + T^∗(w). Since w was chosen arbitrarily, it follows that (S + T )^∗= S^∗+ T^∗.

(ii) Let v ∈ V and w ∈ W , and γ ∈F. It follows from Definition 2.16 that, h(γT )(v), wi_W = hγT (v), wi_W = γhT (v), wi_W.

Yet again, using the fundamental equation and the properties of an inner product the following is obtained,

γhT (v), wiW = γhv, T^∗(w)iV = hv, γT^∗(w)iV. However,

h(γT (v)), wiW = hv, (γT )^∗(w)iV,

and thus (γT )^∗(w) = γT^∗(w). This is true for all v ∈ V , therefore (γT )^∗= γT^∗. (iii) Let v ∈ V and x ∈ X, then S(v) ∈ W . Let (T S)^∗ be defined by (3.2), i.e.,

h(T S)(v), xiX = hv, (T S)^∗(w)iV. Whereas we also have that,

h(T S)(v), xiX= hT (S(v)), xiX = hS(v), T^∗(x)iW = hv, S^∗(T^∗(w))iV

= hv, S^∗T^∗(w)i_V. It follows that (T S)^∗= S^∗T^∗.

(iv) Let v ∈ V , and T (v) ∈ W , and define S = T^∗. Then S(w) ∈ V . From (3.2), hT (v), wiW = hv, T^∗(w)iV = hv, S(w)iV = hS(w), viV = hw, S^∗(v)iW

= hS^∗(v), wiW = hS^∗(v), wiW. Hence, T (v) = S^∗(v) for all v ∈ V . By the construction of S we have that, S^∗= (T^∗)^∗. Thus, (T^∗)^∗= T .

(v) Let u, v ∈ V , and IV be the the identity transformation IV : V → V , Definition 2.32 grants that

hIV(u), viV = hu, viV = hu, IV(v)iV. Using (3.1) we also have that,

hI_V(u), vi_V = hu, I_V^∗(v)i_V,

and hence I_V^∗ = IV.