Tensor Rank

(1)

Examensarbete

Tensor Rank

Elias Erdtman, Carl J¨

onsson

(2)

(3)

Tensor Rank

Applied Mathematics, Link¨opings Universitet Elias Erdtman, Carl J¨onsson LiTH - MAT - EX - - 2012/06 - - SE

Examensarbete: 30 hp Level: A

Supervisor: G¨oran Bergqvist,

Applied Mathematics, Link¨opings Universitet Examiner: Milagros Izquierdo Barrios,

Applied Mathematics, Link¨opings Universitet Link¨oping June 2012

(4)

(5)

Abstract

This master’s thesis addresses numerical methods of computing the typical ranks of tensors over the real numbers and explores some properties of tensors over finite fields.

We present three numerical methods to compute typical tensor rank. Two of these have already been published and can be used to calculate the lowest typical ranks of tensors and an approximate percentage of how many tensors have the lowest typical ranks (for some tensor formats), respectively. The third method was developed by the authors with the intent to be able to discern if there is more than one typical rank. Some results from the method are presented but are inconclusive.

In the area of tensors over finite fields some new results are shown, namely that there are eight GLq(2) × GLq(2) × GLq(2)-orbits of 2 × 2 × 2 tensors over

any finite field and that some tensors over Fq have lower rank when considered

as tensors over Fq2. Furthermore, it is shown that some symmetric tensors over

F2 do not have a symmetric rank and that there are tensors over some other

finite fields which have a larger symmetric rank than rank.

Keywords: generic rank, symmetric tensor, tensor rank, tensors over finite fields, typical rank.

URL for electronic version:

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-78449

(6)

(7)

Preface

“Tensors? Richard had no idea what a tensor was, but he had noticed that when math geeks started throwing the word around, it meant that they were headed in the

general direction of actually getting something done.” - Neal Stephenson, Reamde (2011).

This text is a master’s thesis, written by Elias Erdtman and Carl Jönsson at Linköpings universitet, with Göran Bergqvist as supervisor and Milagros Izquierdo Barrios as examiner, in 2012.

Background

The study of tensors of order greater than two has recently had an upswing, both from a theoretical point of view and in applications, and there are lots of unanswered questions in both areas. Questions of interest are for example what a generic tensor looks like, what are useful tensor decompositions and how can one calculate them, what are and how can one find equations for sets of tensors, etc. Basically one wants to have a theory of tensors as well-developed and easy to use as the theory of matrices.

Purpose

In this thesis we aim to show some basic results on tensor rank and investigate methods for discerning generic and typical ranks of tensors, i.e., searhing for an answer to the question, which ranks are the most ”common”?.

Chapter outline

Chapter 1. Introduction

In the first chapter we present theory relevant to tensors. It is divided in four major parts: the first part is about multilinear algebra, the second part is a short introduction to the CP decomposition, the third part gives the reader the background in algebraic geometry necessary to understand the results in chapter 2. The fourth and last part of the chapter gives an example of the application of tensor decomposition, more specifically the multiplication tensor for 2 × 2 matrices and Strassen’s algorithm for matrix multiplication.

(8)

Chapter 2. Tensor rank

In the second chapter we introduce different notions of rank: tensor rank, multilinear rank, Kruskal rank, etc. We show some basic results on tensors using algebraic geometry, among them some results on generic ranks over C and typical ranks over R.

Chapter 3. Numerical methods and results

Numerical results for determining typical ranks are presented in chapter three. We present an algorithm which can calculate the generic rank for any format of tensor spaces and another algorithm from which one can infer if there is more than one typical rank over R for some tensor space formats. A method developed by the authors is also presented along with results giving an indication that the method does not seem to work. Chapter 4. Tensors over finite fields

This chapter contains some results on finite fields. We present a classi-fication and the sizes of the eight GLq(2) × GLq(2) × GLq(2)-orbits of

F2q⊗ F 2 q⊗ F

2

q and show that the elements of one of the orbits have lower

rank when considered as tensors over Fq2. Finally we show that there are

symmetric tensors over F2which do not have a symmetric rank and over

some other finite fields a symmetric tensor can have a symmetric rank which is greater than its rank.

Chapter 5. Summary and future work

The results of the thesis are summarized and some directions of future work are indicated.

Appendix A. Programs

Program code for Mathematica or MATLAB used to produce the results in the thesis is given in this appendix.

Distribution of work

Since this is a master’s thesis we give account for who has done what in the table below. Section Author 1.1 CJ/EE 1.2 EE 1.3 CJ 1.4 CJ/EE 2.1 CJ/EE 2.2-2.4 CJ 3.1-3.2 EE/CJ 3.3 EE 4 CJ 5 CJ & EE

(9)

Nomenclature

Most of the reoccurring abbreviations and symbols are described here.

Symbols

• F is a field.

• Fq is the finite field of q elements.

• I(V ) is the ideal of an algebraic set V . • V(I) is the algebraic set of zeros to an ideal I. • Seg is the Segre mapping.

• σr(X) is the r:th secant variety of X.

• Sd is the symmetric group on d elements.

• ⊗ is tensor product.

• ~ is the matrix Kronecker product. • ˆ_{X is the affine cone to a set X ∈ PV .}

• dxe is the number x rounded up to the nearest integer.

(10)

(11)

List of Tables

3.1 Known typical ranks for 2 × N2× N3arrays over R. . . 33

3.5 Known typical ranks for N×d_{arrays over R. . . .} 34

3.6 Number of real solutions to (3.7) for 10 000 random 5 × 3 × 3 tensors. . . 36

3.11 Approximate probability that a random I × J × K tensor has rank I. . . 36

3.12 Euclidean distances depending on the fraction of the area on the n-sphere. . . 40

3.13 Number of points from φ2 close to some control points for the 2 × 2 × 2 tensor. . . 40

4.1 _{Orbits of F}2 q ⊗ F2q ⊗ F2q under the action of GLq(2) × GLq(2) × GLq(2) for q = 2, 3. . . 46

4.2 _{Orbits of F}2 q⊗F2q⊗F2q under the action of GLq(2)×GLq(2)×GLq(2). 52 4.3 Number of symmetric 2 × 2 × 2-tensors generated by symmetric rank one tensors over some small finite fields. . . 53

4.4 Number of N ×N ×N symmetric tensors generated by symmetric rank one tensors over F2. . . 55

(14)

(15)

List of Figures

1.1 The image of t 7→ (t, t2_{, t}3_{) for −1 ≤ t ≤ 1. . . .} ₁₀

1.2 The intersection of the surfaces defined by y −x2_{= 0 and z −x}3₌

0, namely the twisted cubic, for (−1, 0, −1) ≤ (x, y, z) ≤ (1, 1, 1). 11 1.3 The cuspidal cubic. . . 13 1.4 An example of a semi-algebraic set. . . 16 3.1 Connection between Euclidean distance and an angle on a

2-dimensional intersection of a sphere. . . 39

(16)

(17)

Chapter 1

Introduction

This first chapter will introduce basic notions, definitions and results concerning multilinear algebra, tensor decomposition, tensor rank and algebraic geometry. A general reference for this chapter is [25].

The simplest way to look at tensors is as a generalization of matrices; they are objects in which one can arrange multidimensional data in a natural way. For instance, if one wants to analyze a sequence of images with small differ-ences in some property, e.g. lighting or facial expression, one can use matrix decomposition algorithms, but then one has to vectorize the images and lose the natural structure. If one could use tensors, one can keep the natural structure of the pictures and it will be a significant advantage. However, the problem then becomes that one needs new results and algorithms for tensor decomposition.

The study of decomposition of higher order tensors has its origins in arti-cles by Hitchcock from 1927 [19, 20]. Tensor decomposition was introduced in psychometrics by Tucker in the 1960’s [41], and in chemometrics by Appellof and Davidson in the 1980’s [2]. Strassen published his algorithm for matrix multiplication in 1969 [37] and since then tensor decomposition has received at-tention in the area of algebraic complexity theory. An overview of the subject, its literature and applications can be found in [1, 24].

Tensor rank, as introduced later in this chapter, is a natural generalization of matrix rank. Kruskal [23] states that is so natural that it was introduced independently at least three times before he introduced it himself in 1976.

Tensors have recently been studied from the viewpoint of algebraic geometry, yielding results on typical ranks, which are the ranks a random tensor takes with non-zero probability. The recent book [25] summarizes the results in the field.

Results often concern the typical ranks of certain formats of tensors, methods for discerning the rank of a tensor or algorithms for computing tensor decomposi-tions. Algorithms for tensor decompositions are often of interest in applications areas, where one wants to find structures and patterns in data. In some cases, just finding a decomposition is not enough, one wants the decomposition to be essentially unique. In these cases one wants an algorithm to find a decomposi-tion of a tensor and some way of determining if it is unique. In other fields of applications, one wants to find decompositions of important tensors, since this will yield better performing algorithms in the field, e.g. Strassen’s algorithm. Of course, an algorithm for finding a decomposition would be of high interest also in this case, but uniqueness is not important. However, in this case, just

(18)

knowing that a tensor has a certain rank gives one the knowledge that there is a better algorithm, but if the decomposition is the important part, just knowing the rank is of little help.

We take a look at efficient matrix multiplication and Strassen’s algorithm as an example application in the end of the chapter. There are other examples of applications of tensor decomposition and rank, e.g. face recognition in the area of pattern recognition, modeling fluorescence excitation-emission data in chemistry, blind deconvolution of DS-CDMA signals in wireless communications, Bayesian networks in algebraic statistics, tensor network states in quantum information theory [25] and in neuroscience tensors are used in the study of effects of new drugs on brain activity [1, 24]. Efficient matrix multiplication is a special case of efficient evaluation of bilinear forms, see [22, 21, section 4.6.4 pp. 506-524], which, among other things, is studied in algebraic complexity theory [9, 25, chapter 13].

Historically, tensors over R and C have been investigated. In chapter 4, we investigate tensors over finite fields and show some new results.

1.1 Multilinear algebra

In this section we introduce the basics of multilinear algebra, which is an exten-sion of linear algebra by expanding the domain from one vector space to several. For an easy introduction to tensor products of vector spaces see [42].

1.1.1 Tensor products and multilinear maps

Definition 1.1.1 (Dual space, dual basis). For a vector space V over the field F, the dual space V∗ of V is the vector space of all linear maps V → F.

If {v1, v2, . . . , vn} is a basis for V the dual basis {α1, α2, . . . , αn} in V∗ is

defined by

αi(vj) =

(

1 i = j 0 i 6= j and extending linearly.

Theorem 1.1.2. If V is of finite dimension, the dual basis is a basis of V∗. Furthermore, V∗ is isomorphic to V . The dual of the dual, (V∗)∗ is naturally isomorphic to V .

Definition 1.1.3 (Tensor product). For vector spaces V, W we define the tensor product V ⊗ W to be the vector space of all expressions of the form

v1⊗ w1+ · · · + vk⊗ wk

where vi∈ V, wi∈ W and the following equalities hold for the operator ⊗:

• λ(v ⊗ w) = (λv) ⊗ w = v ⊗ (λw). • (v1+ v2) ⊗ w = v1⊗ w + v2⊗ w.

• v ⊗ (w1+ w2) = v ⊗ w1+ v ⊗ w2.

(19)

1.1. Multilinear algebra 3

Since V ⊗ W is a vector space, we can iteratively form tensor products V1⊗ V2⊗ · · · ⊗ Vk of an arbitrary number of vector spaces V1, V2, . . . , Vk. An

element of V1⊗ V2⊗ · · · ⊗ Vk is said to be a tensor of order k.

Theorem 1.1.4. If {vi}ni=1V and {wj}nj=1W are bases for V and W respectively,

then {vi⊗ wj}ni=1,j=1V,nW is a basis for V ⊗ W and dim(V ⊗ W ) = dim(V ) dim(W ).

Proof. Any T ∈ V ⊗ W can be written

T =

n

X

k=1

ak⊗ bk

for ak ∈ V, bk ∈ W . Since vi and wj are bases, we can write

ak = nV X i=1 akivi bk = nW X j=1 bkjwj and thus T = n X k=1 nV X i=1 akivi ! ⊗   nW X j=1 bkjwj  = = n X k=1 nV X i=1 nW X j=1 akibkjvi⊗ wj= = nV X i=1 nW X j=1 n X k=1 akibkj ! vi⊗ wj

so that {vi⊗wj}ni=1,j=1V,nW is a basis follows, and this in turn implies dim(V ⊗W ) =

dim(V ) dim(W ). If {v(i)_j }ni

j=1is a basis for Vi, this implies that {v (1) j1 ⊗v (2) j2 ⊗· · ·⊗v (k) jk } n1,...,nk j1=1,...,jk=1

is a basis for V1⊗ V2⊗ · · · ⊗ Vk. Furthermore, if we have chosen a basis for each

Vi, we can identify a tensor T ∈ V1⊗ V2⊗ · · · ⊗ Vk with a k-dimensional array of

size dim V1× dim V2× · · · × dim Vk where the element in position (j1, j2, . . . , jk)

is the coefficient for v_j(1)

1 ⊗ v

(2)

j2 ⊗ · · · ⊗ v

(k)

jk in the expansion of T in the induced

basis for V1⊗ V2⊗ · · · ⊗ Vk. If k = 2, one gets matrices.

If one describes a third order tensor as a three-dimensional array, one can describe the tensor as a tuple of matrices. For example, say the I × J × K tensor T has the entries tijk in its array. Then T can be described as the

tuple (T1, T2, . . . TI) where Ti = (tijk)J,K_j=1,k=1, but it can also be described as

the tuples (T₁0, T₂0, . . . , T_J0) or (T₁00, T₂00, . . . , T_K00), where T_j0 = (tijk) I,K

i=1,k=1 and

T00

i = (tijk) I,J

i=1,j=1. The matrices in the tuples are called the slices of the array.

Sometimes the adjectives frontal, horizontal and lateral are used to distinguish the different kinds of slices.

Example 1.1.5 (Arrays). Let {e1, e2} be a basis for R2. Then e1⊗ e1+

2e1⊗ e2+ 3e2⊗ e1∈ R2⊗ R2can be expressed as the matrix

1 2

3 0

.

(20)

The third order tensor e1⊗e1⊗e1+2e1⊗e2⊗e2+3e2⊗e1⊗e2+4e2⊗e2⊗e2∈

R2⊗ R2⊗ R2 can be expressed as a 3-dimensional array:

1 0 0 2

0 0 3 4

and the slices of the array are 1 0 0 0 ,0 2 3 4 1 0 0 3 ,0 2 0 4 1 0 0 2 ,0 0 3 4

where each pair arises from a different way of cutting the tensor.

Definition 1.1.6 (Tensor rank). The smallest R for which T ∈ V1⊗ · · · ⊗ Vk

can be written T = R X r=1 v(1)_r ⊗ · · · ⊗ v(k)r , (1.1)

for arbitrary vectors vr(i)∈ Vi is called the tensor rank of T .

Definition 1.1.7 (Multilinear map). Let V1, . . . Vk be vector spaces over F. A

map

f : V1× · · · × Vk→ F,

is a multilinear map if f is linear in each factor Vi.

Theorem 1.1.8. The set of all multilinear maps V1× · · · × Vk → F can be

identified with V₁∗⊗ · · · ⊗ V∗ k.

Proof. Let Vihave dimension niand basis {v (i) 1 , . . . , v

(i)

ni}, and let the dual basis

be {α(i)₁ , . . . , α(i)ni}. Then f ∈ V

∗ 1 ⊗ · · · ⊗ Vk∗ can be written f = X i1,...,ik βi1,...,ikα (1) i1 ⊗ . . . ⊗ α (k) ik

and for (u1, . . . , uk) ∈ V1× · · · × Vk acts as a multilinear mapping by:

f (u1, . . . , uk) = X i1,...,ik βi1,...,ikα (1) i1 (u1) · · · α (k) ik (uk).

Conversely, let f : V1× · · · × Vk → F be a multilinear mapping. Pick a basis

{v(i)₁ , . . . , vn(i)i} for Vi and let the dual basis be {α

(i) 1 , . . . , α (i) ni}. Define βi1,...,ik= f (v (1) i1 , . . . , v (k) ik ) and thus X i1,...,ik βi1,...,ikα (1) i1 ⊗ . . . ⊗ α (k) ik ∈ V ∗ 1 ⊗ · · · ⊗ Vk∗

(21)

1.1. Multilinear algebra 5

A multilinear mapping (V1×· · ·×Vk)×W∗→ F can be seen as an element of

V₁∗⊗ · · · ⊗ V∗

k ⊗ W and can also be seen as a map V1⊗ · · · ⊗ Vk→ W . Explicitly,

if f : (V1× · · · × Vk) × W∗→ F is written f =P_iα (1) i ⊗ · · · ⊗ α (k) i ⊗ wi it acts on an element in V1× · · · × Vk× W∗ by f (v1, . . . , vk, β) = X i α(1)_i (v1) · · · α (k) i (vk)wi(β) ∈ F

but it can also act on an element in V1× · · · × Vk by

f (v1, . . . , vk) = X i α_i(1)(v1) · · · α (k) i (vk)wi∈ W.

Example 1.1.9 (Linear maps). Given two vector spaces V, W the set of all linear maps V → W can be identified with V∗⊗ W . If f =Pn

i=1αi⊗ wi, f

acts as a linear map V → W by

f (v) =

n

X

i=1

αi(v)wi

or, going in the other direction, if f is a linear map f : V → W , we can describe it as a member of V∗⊗ W by taking a basis {v1, v2, . . . , vn} for V and its dual

basis {α1, α2, . . . , αn} and setting wi= f (vi), so we get

f =

n

X

i=1

αi⊗ wi.

1.1.2 Symmetric and skew-symmetric tensors

Two important subspaces of second order tensors V ⊗ V are the symmetric tensors and the skew-symmetric tensors. First, define the map τ : V ⊗V → V ⊗V by τ (v1⊗ v2) = v2⊗ v1 and extending linearly (τ can be interpreted as the

non-trivial permutation on two elements). The spaces of symmetric tensors, S2V , and skew-symmetric tensors, Λ2V , can then be defined as:

S2V := span{v ⊗ v | v ∈ V } = = {T ∈ V ⊗ V | τ (T ) = T },

Λ2V := span{v ⊗ w − w ⊗ v| v, w ∈ V } = = {T ∈ V ⊗ V | τ (T ) = −T }.

Let us define two operators that give the symmetric and anti-symmetric part of a second order tensor. For v1, v2∈ V , define the symmetric part of v1⊗ v2 to

be v1v2= 1₂(v1⊗ v2+ v2⊗ v1) ∈ S2V and the anti-symmetric part of v1⊗ v2to

be v1∧ v2= 1₂(v1⊗ v2− v2⊗ v1) ∈ Λ2V and we have v1⊗ v2= v1v2+ v1∧ v2.

To expand the definition of symmetric and skew-symmetric tensor, over R and C, to higher order we need to generalize these operators. Denote the tensor product of the same vector space k times as V⊗k_{. Then for the symmetric case}

the map πS : V⊗k→ V⊗k is defined on rank-one tensors by

πS(v1⊗ · · · ⊗ vk) = 1 k! X τ ∈Sk vτ (1)⊗ · · · ⊗ vτ (k)= v1v2· · · vk,

(22)

where Sk is the symmetric group on k elements.

For the skew-symmetric tensors the map πΛV⊗k→ V⊗k is defined on

rank-one elements by πΛ(v1⊗ · · · ⊗ vk) = 1 k! X τ ∈Sk sgn(τ )vτ (1)⊗ · · · ⊗ vτ (k)= v1∧ · · · ∧ vk.

πS and πΛ are then extended linearly to act on the entire space.

Definition 1.1.10 (SkV, ΛkV ). Let V be a vector space. The space of sym-metric tensors Sk_{V is defined as}

SkV = πS(V⊗k) =

= {X ∈ V⊗k | πS(X) = X}.

The space of skew-symmetric tensors or alternating tensors is defined as ΛkV = πΛ(V⊗k) =

= {X ∈ V⊗k | πΛ(X) = X}.

The space Sk_V∗_{can be seen as the space of symmetric k-linear forms on V ,}

but also as the space of homogeneous polynomials of degree k on V , so we can identify homogeneous polynomials of degree k with symmetric k-linear forms. We do this through a process called polarization.

Theorem 1.1.11 (Polarization identity). Let f be a homogeneous polynomial of degree k. Then ¯ f (x1, x2, . . . , xk) = 1 k! X I⊂[k],I6=∅ (−1)k−|I|f X i∈I xi !

is a symmetric k-linear form. Here [k] = {1, 2, . . . , k}.

Example 1.1.12. Let P (s, t, u) be a cubic homogenous polynomial in three variables. Plugging this into the polarization identity yields the folowing mul-tilinear form: ¯ P     s1 t1 u1  ,   s2 t2 u2  ,   s3 t3 u3    = 1 3![P (s1+ s2+ s3, t1+ t2+ t3, u1+ u2+ u3) − P (s1+ s2, t1+ t2, u1+ u2) − P (s1+ s3, t1+ t3, u1+ u3) − P (s2+ s3, t2+ t3, u2+ u3) + P (s1, t1, u1) + P (s2, t2, u2) + P (s3, t3, u3)]

For example, if P (s, t, u) = stu one gets ¯

P = 1

(23)

1.2. Tensor decomposition 7

1.1.3 GL(V

1

) × · · · × GL(V

k

) acts on V

1

⊗ · · · ⊗ V

k

GL(V ) is the group of invertible linear maps V → V . An element (g1, g2, . . . , gk) ∈

GL(V1) × · · · × GL(Vk) acts on an element v1⊗ v2⊗ · · · ⊗ vk ∈ V1⊗ · · · ⊗ Vk by

(g1, g2, . . . gk) · (v1⊗ · · · ⊗ vk) = g1(v1) ⊗ · · · ⊗ gk(vk)

and on the whole space V1⊗ · · · ⊗ Vk by extending linearly.

If one picks a basis for each V1, . . . , Vk, say {v (i) j }

ni

j=1 is a basis for Vi, one

can write gi(v (i) j ) = ni X l=1 α(i)_j,lv_l(i), (1.2) and if T ∈ V1⊗ · · · ⊗ Vk, T = X j1,...,jk βj1,...,jkv (1) j1 ⊗ · · · ⊗ v (k) jk . (1.3) Thus, if g = (g1, . . . , gk), g · T = X j1,...,jk βj1,...,jkg1(v (1) j1 ) ⊗ · · · ⊗ g(v (k) jk ) = = X j1,...,jk βj1,...,jk X l1,...,lk α(1)_j 1,l1· · · α (k) jk,lkv (1) l1 ⊗ · · · ⊗ v (k) lk = = X l1,...,lk   X j1,...,jk βj1,...,jkα (1) j1,l1· · · α (k) jk,lk  v (1) l1 ⊗ · · · ⊗ v (k) lk . (1.4)

One can note that the α’s in (1.2) gives the matrix of gi, and that the β’s in

(1.3) gives the tensor T as a k-dimensional array. Thus the scalars X j1,...,jk βj1,...,jkα (1) j1,l1· · · α (k) jk,lk

in (1.4) gives the coefficients in the k-dimensional array representing g · T .

1.2 Tensor decomposition

Let us start to consider how factorisation and decomposition works for tensors of order two, in other word matrices. Depending on the application and the resources for calculation, different decompositions are used. A very important decomposition is the singular value decomposition (SVD). It decomposes a ma-trix M into a sum of outer products (tensor products) of vectors as

M = R X r=1 σrurvrT = R X r=1 σrur⊗ vr.

Here urand vrare pairwise orthonormal vectors, σrare the singular values and

R is the rank of the matrix M , and these conditions make the decomposition essentially unique. The rank of M is the number of non-zero singular values and the best low rank-approximations of M are given by truncating the sum.

(24)

For tensors of order greater than two the situation is different. A decompo-sition that is a generalization of the SVD, but not of all its properties, is called CANDECOMP (canonical decomposition), PARAFAC (parallel factors analy-sis) or CP decomposition [24]. It is also the sum of tensor products of vectors as the following: T = R X r=1 v(1)_r ⊗ · · · ⊗ v(k) r ,

where Vj are vector spaces and v (j)

i ∈ Vj. As one can see the CP decomposition

is used to define the rank of a tensor, where R is the rank of T if R is the smallest possible number such that equality holds (definition 1.1.6).

A big issue with higher order tensors is that there is no method or algorithm to calculate the CP decomposition exactly, which would also give the rank of a tensor. A common algorithm to calculate the CP decomposition is the alter-nating least square (ALS) algorithm. It can be summarized as a least square method where we let the values from one vector space change while the others are fixed. Then the same is done for the next vector space and so forth for all vector spaces. If the difference between the approximation and the given tensor is too large the whole procedure is repeated until the difference is small enough. The algorithm is described in algorithm 1 where T is a tensor of size d1×

· · · × dN. The norm that is used is the Frobenius norm, and it is defined as

kT k2= d1,...,dN X i1=1,...,iN=1 |Ti1,...,iN| 2 , (1.5)

where Ti1,...,iN denotes the i1, . . . , iN component of T . One thing to notice is

that the rank is needed as a parameter for the calculations, so if the rank is not known it needs to be approximated before the algorithm can start.

Algorithm 1 ALS algorithm to calculate the CP decomposition Require: T, R

Initialize a(n)r ∈ Rdn for n = 1, . . . , N and r = 1, . . . , R.

repeat for n = 1,. . . ,N do Solve min a(n)_i ,i=1,...,R T − R X r=1 a(1)_r ⊗ · · · ⊗ a(N )r 2 .

Update a(n)_i to its newly calculated value, for i = 1, . . . R. end for until T − PR r=1a (1) r ⊗ · · · ⊗ a (N ) r 2

< threshold or maximum iteration is reached

return a(1)r , . . . a(N )r for r = 1, . . . , R.

This is actually a way to decide the rank of a tensor, but the method has a few problems. First of all is the issue with border rank (see section 2.1), which makes it possible to approximate some tensors arbitrary well with tensors with lower rank (see example 2.1.1). Furthermore the algorithm is not guaranteed to converge to a global optimum, and even if it does converge, it might need a large number of iterations [24].

(25)

1.3. Algebraic geometry 9

1.3 Algebraic geometry

In this section we introduce basic notions of algebraic geometry, which is the study of objects defined by polynomial equations. References for this section are [13, 17, 25, 31], and for section 1.3.6, [6].

1.3.1 Basic definitions

Definition 1.3.1 (Monomial). A monomial in variables x1, x2, . . . , xnis a

prod-uct of variables xα1 1 x α2 2 . . . x αn n

where αi ∈ N = {0, 1, 2, . . . }. Another notation for this is xα where x =

(x1, x2, . . . , xn) and α = (α1, α2, . . . , αn) ∈ Nn. α is called a multi-index.

Definition 1.3.2 (Polynomial). Given a field F, a polynomial is a finite linear combination of monomials with coefficients in F, i.e. if f is a polynomial over F it can be written

f = X

α∈A

cαxα

for some finite set A and cα∈ F.

A homogenuos polynomial is a polynomial where all the multi-indices α ∈ A sum to the same integer. In other words, all the monomials have the same degree.

The set F[x1, x2, . . . , xn] of all polynomials over the field F in variables

x1, x2, . . . , xnforms a commutative ring. Since it will be important in the sequel,

we remind of some important definitions and results in ring theory.

Definition 1.3.3 (Ideal). If R is a commutative ring (e.g. F[x1, x2, . . . , xn]),

an ideal in R is a set I for which the following holds:

• If x, y ∈ I, we have x + y ∈ I (I is a subgroup of (R, +).) • If x ∈ I and r ∈ R we have rx ∈ I.

If f1, f2, . . . , fk∈ R, the ideal generated by f1, f2, . . . , fk, denoted hf1, f2, . . . , fki,

is defined as: hf1, f2, . . . , fki = ( k X i=1 qifi | qi∈ R ) .

The next theorem is a special case of Hilbert’s basis theorem.

Theorem 1.3.4. Every ideal in the polynomial ring F[x1, x2, . . . , xn] is finitely

generated, i.e. for every ideal I there exists polynomials f1, f2, . . . , fk such that

(26)

1.3.2 Varieties and ideals

Definition 1.3.5 (Affine algebraic set). An affine algebraic set is the set X ⊂ Fn of solutions to a system of polynomial equations

f1= 0

f2= 0

.. . fk = 0

for a given set {f1, f2, . . . , fk} of polynomials in n variables. We write X =

V(f1, f2, . . . fk) for this affine algebraic set.

An algebraic set X is called irreducible, or a variety if it cannot be written as X = X1∪ X2 for algebraic sets X1, X2⊂ X.

Definition 1.3.6 (Ideal of an affine algebraic set). For an algebraic set X ⊂ Fn_,

the ideal of X, denoted I(X) is the set of polynomials f ∈ F[x1, x2, . . . , xn] such

that

f (a1, a2, . . . , an) = 0

for every (a1, a2, . . . , an) ∈ X.

When one works with algebraic sets one wants to find equations for the set and this can mean different things. A set of polynomials P = {p1, p2, . . . , pk} is

said to cut out the algebraic set X set-theoretically if the set of common zeros of p1, p2, . . . , pk is X. P is said to cut out X ideal-theoretically if P is a generating

set for I(X).

Example 1.3.7 (Twisted cubic). The twisted cubic is a curve in R3 which can be given as the image of R under the mapping t 7→ (t, t2, t3), fig. 1.1. However, the twisted cubic can also be viewed as an algebraic set, namely V(y −x2_{, z −x}3_),

fig. 1.2.

(27)

Figure 1.2: The intersection of the surfaces defined by y −x2= 0 and z −x3= 0, namely the twisted cubic, for (−1, 0, −1) ≤ (x, y, z) ≤ (1, 1, 1).

Example 1.3.8 (Matrices of rank r). Given vector spaces V, W of dimensions n and m and bases {vi}ni=1and {wj}mj=1respectively, V∗⊗ W can be identified

with the set of m × n matrices. The set of matrices of rank at most r is a variety in this space, namely the variety defined as the zero set of all (r + 1) × (r + 1) minors, since a matrix has rank less than or equal to r if and only if all of its (r + 1) × (r + 1) minors are zero.

For example, if n = 4 and m = 3, a matrix defining a map between V and W can be written   x11 x12 x13 x14 x21 x22 x23 x24 x31 x32 x33 x34  

and the variety of matrices of rank 2 or less is the matrices satisfying x11 x12 x13 x21 x22 x23 x31 x32 x33 = 0 x11 x12 x14 x21 x22 x24 x31 x32 x34 = 0 x11 x13 x14 x21 x23 x24 x31 x33 x34 = 0 x12 x13 x14 x22 x23 x24 x32 x33 x34 = 0.

That these equations cut out the set of 4 × 3 matrices of rank 2 or less set-theoreotically is easy to prove. They also generate the ideal for the variety, but this is harder to prove.

1.3.3 Projective spaces and varieties

Definition 1.3.9 (Projective space). The n-dimensional projective space over F, denoted Pn(F), is the set Fn+1\ {0} modulo the equivalence relation ∼ where x ∼ y if and only if x = λy for some λ ∈ F \ {0}. For a vector space V we write PV for the projectivization of V , and if v ∈ V , we write [v] for the equivalence

(28)

class to which v belongs, i.e. [v] is the element in PV corresponding to the line λv in V . For a subset X ⊆ PV we will write ˆX for the affine cone of X in V , i.e. ˆX = {v ∈ V : [v] ∈ X}.

We will now define what is meant by a projective algebraic set. Note that the zero locus of a polynomial is not defined in projective space, since in general f (x) 6= f (λx) for a polynomial f , but x = λx in projective space. However, for a polynomial F which is homogeneous of degree d the zero locus is well defined, since F (λx) = λd_{F (x). Note that even though the zero locus of a homogeneous}

polynomial is well defined on projective space, the homogeneous polynomials are not functions on projective space.

Definition 1.3.10 (Projective algebraic set). A projetive algebraic set X ⊂ Pn(F) is the solution set to a system of polynomial equations

F1(x) = 0

F2(x) = 0

.. . Fk(x) = 0

for a set {F1, F2, . . . , Fk} of homogeneous polynomials in n + 1 variables.

A projective algebraic set is called irreducible or a projective variety if it is not the union of two projective algebraic sets.

Definition 1.3.11 (Ideal of a projective algebraic set). If X ⊂ Pn

(F) is an algebraic set, its ideal I(X) is the set of all homogeneous polynomials which vanish on X, i.e. I(X) consists of all polynomials F such that

F (a1, a2, . . . , an+1) = 0

for all (a1, a2, . . . , an+1) ∈ X.

Definition 1.3.12 (Zariski topology). The Zariski topology on Pn(F) (or Fn) is defined by its closed sets, which are taken to be all the sets X for which there exists a set S of homogeneous polynomials (or arbritrary polynomials in the case of Fn) such that

X = {α : f (α) = 0 ∀f ∈ S}. The Zariski closure of a set X is the set V(I(X)).

1.3.4 Dimension of an algebraic set

Definition 1.3.13 (Tangent space). Let M be a subset of a vector space V over F = R or C and let x ∈ M . The tangent space ˆTxM ⊂ V is the span of

vectors which are derivatives α0_{(0) of a smooth curve α : F → M such that} α(0) = x.

For a projective algebraic set X ⊂ PV , the affine tangent space to X at [x] ∈ X is ˆT[x]X := ˆTxX.ˆ

Definition 1.3.14 (Smooth and singular points). If dim ˆTxX is constant at

and near x, x is called a smooth point of X. If x is not smooth, it is called a singular point. For a variety X, let Xsmoothand Xsing denote the smooth and

(29)

Definition 1.3.15 (Dimension of a variety). For an affine algebraic set X, define the dimension of X as dim(X) := dim( ˆTxX) for x ∈ Xsmooth.

For an projective algebraic set X, define the dimension of X as dim(X) := dim( ˆTxX) − 1 for x ∈ Xsmooth.

Example 1.3.16 (Cuspidal cubic). The variety X in R2_{given by X = V(y}2₋

x3_{) is called the cuspidal cubic, see fig. 1.3. The cuspidal cubic has one singular}

point, namely (0, 0). One can see that both the unit vector in the x-direction and the unit vector in the y-direction are tangent vectors to the variety at the point (0, 0). Thus dim ˆT(0,0)X = 2, but for all x 6= (0, 0) on the cuspidal cubic

we have dim ˆTxX = 1, so (0, 0) is a singular point but all other points are

smooth and the dimension of the cuspidal cubic is one.

Figure 1.3: The cuspidal cubic.

Example 1.3.17 (Matrices of rank r). Going back to the example of the ma-trices of size m × n with rank r or less, these can also be seen as a projective variety. We form the projective space Pm×n−1(F) (i.e. the space of matrices where matrices A and B are identified iff A = λB for some λ 6= 0, note that if A and B are identified they have the same rank). The equations will still be the same; the minors of size (r + 1) × (r + 1), which are homogeneous of degree r + 1.

Example 1.3.18 (Segre variety). This variety will be very important in the sequel. Let V1, V2, . . . be complex vector spaces. The two-factor Segre variety

is the variety defined as the image of the map

Seg : PV1× PV2→ P(V1⊗ V2)

Seg([v1], [v2]) = [v1⊗ v2]

and it can be seen that the image of this map is the projectivization of the set of rank one tensors in V1⊗ V2.

We can in a similar fashion define the n-factor Segre as the image of Seg : PV1× · · · × PVn→ P(V1⊗ · · · ⊗ Vn)

(30)

and the image is once again the projectivization of the set of rank one tensors in V1⊗ · · · ⊗ Vn.

That the 2-factor Segre variety is an algebraic set follows from the fact that the 2 × 2 minors furnish equations for the variety. In the next chapter we will work with the 3-factor Segre variety, for which equations are provided in section 2.3.1. For a general proof for the n-factor Segre, see [25, page 103].

Any curve in Seg(PV1× PV2) is of the form v1(t) ⊗ v2(t), and its derivative

will be v0

1(0) ⊗ v2(0) + v1(0) ⊗ v02(0). Thus

ˆ

T[v1⊗v2]Seg(PV1× PV2) = V1⊗ v2+ v1⊗ V2

and the intersection between V1⊗ v2 and v1⊗ V2is the one-dimensional space

spanned by v1⊗ v2. Therefore the dimension of the Segre variety is n1+ n2− 2,

where n1, n2are the dimensions of V1, V2 respectively.

1.3.5 Cones, joins, and secant varieties

Definition 1.3.19 (Cone). Let X ⊂ Pn(F) be a projective variety and p ∈ Pn(F) a point. The cone over X with vertex p, J(p, X), is the Zariski closure of the union of all the lines pq joining p with a point q ∈ X, i.e.:

J (p, X) = [

q∈X

pq.

Definition 1.3.20 (Join of varieties). Let X1, X2 ⊂ Pn(F) be two varieties.

The join of X1and X2is the set

J (X1, X2) =

[

p1∈X1,p2∈X2,p16=p2

p1p2

which can be interpreted as the Zariski closure of the union of all cones over X2

with a vertex in X1, or vice versa.

The join of several varieties X1, X2, . . . , Xk is defined inductively:

J (X1, X2, . . . , Xk) = J (X1, J (X2, . . . , Xk)).

Definition 1.3.21 (Secant variety). Let X be a variety. The r:th secant variety of X is the set

σr(X) = J (X, . . . , X

| {z }

k copies

).

Lemma 1.3.22 (Secant varieties are varieties). Secant varieties of irreducible algebraic sets are irreducible, i.e. they are varieties.

Proof. See [17, p. 144, prop. 11.24]. Let X ⊂ Pn

(F) be an algebraic set of dimension k. The expected dimension of σr(X) is min{rk + r − 1, n}. However, the dimension is not always the

expected.

Definition 1.3.23 (Degenerate secant variety). Let X ⊂ Pn(F) be an projective variety with dim(X) = k. If dim σr(X) < min{rk + r − 1, n}, then σr(X) is

(31)

Definition 1.3.24 (X-rank). If V is a vector space over C, X ⊂ PV is a projective variety and p ∈ PV is a point, the X-rank of p is the smallest number r of points in X such that p lies in their linear span. The X-border rank of p is the least number r such that p lies in the σr(X), the r:th secant variety of X.

The generic X-rank is the smallest r such that σr(X) = PV .

These notions of X-rank and X-border rank will coincide with the ideas of tensor rank and tensor border rank (see section 2.1) when X is taken to be the Segre variety.

Lemma 1.3.25 (Terracini’s lemma). Let xi for i = 1, . . . , r be general points

of ˆXi, where Xiare projective varieties in PV for a complex vector space V and

let [u] = [x1+ · · · + xr] ∈ J (X1, . . . , Xr). Then

ˆ

T[u]J (X1, · · · , Xr) = ˆT[x1]X1+ · · · + ˆT[xr]Xr.

Proof. It is enough to consider the case of u = x1+ x2 for x1 ∈ X1, x2 ∈ X2

for varieties X1, X2 ∈ PV and deriving the expression for ˆT[u]J (X1, X2). The

addition map a : V × V → V is defined by a(v1, v2) = v1+ v2. Then

ˆ

J (X1, X2) = a( ˆX1× ˆX2)

and so, for general points x1, x2, ˆT[u]J (X1, X2) is obtained by differentiating

curves x1(t) ∈ X1, x2(t) ∈ X2 with x1(0) = x1, x2(0) = x2. Thus the tangent

space to x1+ x2in J (X1, X2) will be the sum of tangent spaces of x1in X1and

x2 in X2.

1.3.6 Real algebraic geometry

In section 2.4 we will need the following definition.

Definition 1.3.26 (Affine semi-algebraic set). An affine semi-algebraic set is a subset of Rn _{of the form:}

s [ i=1 ri \ j=1 {x ∈ R | fi,ji,j0}

where fi,j ∈ R[x1, . . . , xn] and i,j is < or =.

Example 1.3.27 (Semi-algebraic set). Consider the semi-algebraic set given by f1,1 = x2+ y2− 2 f1,2 = x − 3 2y f1,3 = −y f2,1 = x2+ y2− 2 f2,2 = x + 3 2y f2,3 = y f3,1 = (x − 2)2+ y2− 1 4 f4,1 = (x − 7/2)2+ y2− 1 4

(32)

and all i,j being <. The set can be vizualised as in figure 1.4.

Figure 1.4: An example of a semi-algebraic set.

1.4 Application to matrix multiplication

We take a look at the problem of efficient computation of the product of 2 × 2 matrices.

Let A, B, C be copies of the space of n × n matrices, and let the multiplica-tion mapping mn : A × B → C given by mn(M1, M2) = M1M2. To compute

the matrix M3 = m2(M1, M2) = M1M2 one can naively use eight

multiplica-tions and four addimultiplica-tions using the standard method for matrix multiplication. Explicitly, if M1= a1 1 a12 a2 1 a22 M2= b1 1 b12 b2 1 b22

one can compute M3= M1M2 by

c1₁= a1₁b1₁+ a1₂b2₁ c1₂= a1₁b1₂+ a1₂b2₂ c2₁= a2₁b1₁+ a2₂b2₁ c22= a21b12+ a22b22.

However, this is not optimal. Strassen [37] showed that one can calculate M3= M1M2 using only seven multiplications. First, one calculates

k1= (a11+ a22)(b11+ b22) k2= (a21+ a 2 2)b 1 1 k3= a11(b 1 2− b 2 2) k4= a22(−b 1 1+ b 2 1) k5= (a11+ a12)b22 k6= (−a11+ a 2 1)(b 1 1+ b 1 2) k7= (a12− a 2 2)(b 2 1+ b 2 2)

(33)

1.4. Application to matrix multiplication 17

and the coeffients of M3= M1M2 can then be calculated as

c1₁= k1+ k4− k5+ k7

c2₁= k2+ k4

c1₂= k3+ k5

c22= k1+ k3− k2+ k6.

Now, the map mn: A × B → C is obviously a bilinear map and as such can

be expressed as a tensor. Let us take a look at m2. Equip A, B, C with the

same basis 1 0 0 0 _{0 1} 0 0 _{0 0} 1 0 _{0 0} 0 1 .

For clarity, let m2: A × B → C and let the bases be {a j i} 2,2 i=1,j=1, {b j i} 2,2 i=1,j=1,

{cj_i}2,2_i=1,j=1. Let the dual bases of A, B be {αj_i}2,2_i=1,j=1, {β_ij}2,2_i=1,j=1respectively. Thus, m2∈ A∗⊗ B∗⊗ C and the standard algorithm for matrix multplication

corresponds to the following rank eight decomposition of m2:

m2= (α11⊗ β11+ α12⊗ β12) ⊗ c11+ (α11⊗ β21+ α12⊗ β22) ⊗ c12 + (α2₁⊗ β1 1+ α 2 2⊗ β 2 1) ⊗ c 2 1+ (α 2 1⊗ β 1 2+ α 2 2⊗ β 2 2) ⊗ c 2 2

whereas Strassen’s algorithm corresponds to a rank seven decomposition of m2:

m2= (α11+ α 2 2) ⊗ (β 1 1+ β 2 2) ⊗ (c 1 1+ c 2 2) + (α 2 1+ α 2 2) ⊗ β 1 1⊗ (c 2 1− c 2 2) + α1₁⊗ (β1 2− β 2 2) ⊗ (c 1 2+ c 2 2) + α 2 2⊗ (−β 1 1+ β 2 1) ⊗ (c 1 1+ c 2 1) + (α1₁+ α1₂) ⊗ β₂2⊗ (−c1 1+ c 1 2) + (−α 1 1+ α 2 1) ⊗ (β 1 1+ β 1 2) ⊗ c 2 2 + (α12− α 2 2) ⊗ (β 2 1+ β 2 2) ⊗ c 1 1.

It has been proven that both the rank and border rank of m2 is seven [26].

This can be seen from the fact that σ7(Seg(PA × PB × PC)) = P(A ⊗ B ⊗ C).

However, the rank of mn for n ≥ 3 is still unkown. Even for m3, all that is

known is that the rank is between 19 and 23 [25, chapter 11]. It is interesting to note that this is lower than the generic rank for 9 × 9 × 9 tensors, which is 30 (theorem 2.3.8). The rank of m2 is however the generic seven.

(34)

(35)

Chapter 2

Tensor rank

In this chapter we present some results on tensor rank, mainly from the view of algebraic geometry. We introduce different types of rank of a tensor and show some basic results concerning these different types of ranks. We derive equations for the Segre variety and show some basic results on secant defects of the Segre variety and generic ranks. A general reference for this chapter is [25].

2.1 Different notions of rank

If T : U → V is a linear operator and U, V are vector spaces, the rank of T is the dimension of the image T (U ). If one considers T as an element of U∗⊗ V , the rank of T coincides with the smallest integer R such that T can be written

T =

R

X

i=1

αi⊗ vi.

However, if one considers a T ∈ V1⊗ V2⊗ · · · ⊗ Vk, this can be viewed as a

linear operator V_i∗→ V1⊗ · · · ⊗ Vi−1⊗ Vi+1⊗ · · · ⊗ Vk for any 1 ≤ i ≤ k, so T

can be viewed as a linear operator in these k different ways, and for every way we get a different rank. The k-tuple (dim T (V₁∗), . . . , dim T (V_k∗)) is known as the multilinear rank of T . However, the smallest integer R such that T can be written T = R X i=1 v_i(1)⊗ · · · ⊗ v(k)_i

is known as the rank of T (sometimes called the outer product rank ). If T is a tensor, let R(T ) denote the rank of T .

The idea of tensor rank gets more complicated still. If a tensor T has rank R it is possible that there exist tensors of rank ˜R < R such that T is the limit of these tensors, in which case T is said to have border rank ˜R. Let R(T ) denote the border rank of the tensor T .

(36)

Example 2.1.1 (Border rank). Consider the numerically given tensor T T =0 1 ⊗1 0 ⊗1 1 +1 2 ⊗1 0 ⊗1 1 +0 1 ⊗2 1 ⊗1 1 +0 1 ⊗1 0 ⊗−1 1 = 1 0 1 0 4 1 6 1 . One can show that T has rank 3, for instance with a method for p × p × 2 tensors used in [36]. Now consider the rank-two tensor T (ε)

T (ε) = ε − 1 ε 0 1 ⊗1 0 ⊗1 1 +1 ε 0 1 + ε1 2 ⊗1 0 + ε2 1 ⊗1 1 + ε−1 1 . Calculating T (ε) for a few values of ε gives us the following results

T (1) = 0 0 6 2 0 0 18 6 , T 10−1 = 1.0800 0.0900 1.0800 0.1100 3.9600 1.0800 6.8400 1.3200 , T 10−3 = 1.0010 1.0010 1.0030 0.0010 4.0000 1.0010 6.0080 1.0030 , T 10−5 = 1.0000 0.0000 1.0000 0.0000 4.0000 1.0000 6.0001 1.0000 , which gives us an indication that T (ε) → T when ε → 0.

The above tensor is a special case of tensors on the form

T = a1⊗ b1⊗ c1+ a2⊗ b1⊗ c1+ a1⊗ b2⊗ c1+ a1⊗ b1⊗ c2

and even in this general case one can show that T has rank three, but there are tensors of rank two arbitrarly close to it:

T (ε) = 1

ε((ε − 1)a1⊗ b1⊗ c1+ (a1+ εa2) ⊗ (b1+ εb2) ⊗ (c1+ εc2)) = =1

ε(εa1⊗ b1⊗ c1− a1⊗ b1⊗ c1+ a1⊗ b1⊗ c1+ εa2⊗ b1⊗ c1+ + εa1⊗ b2⊗ c1+ εa1⊗ b1⊗ c2+ O(ε2)) → T , when ε → 0.

There is a well-known result for matrices which states that if one fills an n×m matrix with random entries, the matrix will have maximal rank, min{n, m}, with probability one. In the case of square matrices, a random matrix will be invertible with probability one. For tensors over C the situation is similar; a random tensor will have a certain rank with probability one - this rank is called the generic rank. Over R however, there can be multiple ranks, called typical ranks, which a random tensor takes with non-zero probability, see more in section 2.4. For now, we remind of definition 1.3.24, and that the generic rank is the smallest r such that the r:th secant variety of the Segre variety is the whole space. Compare these observations and definitions with the fact that GL(n, C) is a n2-dimensional manifold in the n2-dimensional space of n × n matrices, and a random matrix in this space is invertible with probability one.

(37)

2.1. Different notions of rank 21

2.1.1 Results on tensor rank

Theorem 2.1.2. Given an I × J × K tensor T , R(T ) is the minimal number p of rank one J × K matrices S1, . . . , Sp such that Ti∈ span(S1, . . . , Sp) for all

slices Ti of T .

Proof. For a tensor T one can write

T =

R(T )

X

k=1

ak⊗ bk⊗ ck

and thus, if ak= (a1k, . . . , aIk)T, we have

Ti= R(T )

X

k=1

ai_kbk⊗ ck

so Ti∈ span(b1⊗c1, . . . , bR(T )⊗cR(T )) for i = 1, . . . , I, which proves R(T ) ≥ p.

If Ti∈ span(S1, . . . , Sp) with rank(Sj) = 1 for i = 1, . . . , I, we can write

Ti= p X k=1 xi_kSk= p X k=1 xi_kyk⊗ zk

and thus with xk= (x1k, . . . , x I k) we get T = p X k=1 xk⊗ yk⊗ zk

which proves R(T ) ≤ p, resulting in R(T ) = p.

Corollary 2.1.3. For an I × J × K-tensor T , R(T ) ≤ min{IJ, IK, J K}. Proof. One observes from theorem 2.1.2 that one can manipulate any of the three kinds of slices in T , and thus one can pick the kind which results in the smallest matrices, say m × n. The space of m × n matrices is spanned by the mn matrices Mkl= {δkl(i, j)}m,ni,j=1. Thus one cannot need more than mn rank

one matrices to get all the slices in the linear span.

2.1.2 Symmetric tensor rank

Definition 2.1.4 (Symmetric rank). Given a tensor T ∈ Sd_{V , the symmetric}

rank of T , denoted RS(T ), is defined as the smallest R such that

T =

R

X

r=1

vr⊗ · · · ⊗ vr,

for vi ∈ V . The symmetric border rank of T is defined as the smallest R such

that T is the limit of symmetric tensors of symmetric rank R.

Since we, over R and C, can put symmetric tensors of order d in bijective correspondence with homogeneous polynomials of degree d, and vectors in bijec-tive correspondence with linear forms, the symmetric rank of a given symmetric

(38)

tensor can be translated to the number R of linear forms needed for a given homogeneous polynomial of degree d to be expressed as a sum of linear forms to the power of d. That is, if P is a homogeneous polynomial of degree d over C, what is the least R such that

P = ld₁+ · · · + ld_R

for linear forms li? Over C, the following theorem gives an answer to this

question in the generic case.

Theorem 2.1.5 (Alexander-Hirschowitz). The generic symmetric rank in Sd

Cn is & _n+d−1 d n ' (2.1) except for d = 2, where the generic symmetric rank is n and for (d, n) ∈ {(3, 5), (4, 3), (4, 4), (4, 5)} where the generic symmetric rank is (2.1) plus one. Proof. A proof can be found in [7]. An overview and introduction to the proof can be found in [25, chapter 15].

During the American Institute of Mathematics (AIM) workshop in Palo Alto, USA, 2008 (see [33]) P. Comon stated the following conjecture:

Conjecture 2.1.6. For a symmetric tensor T ∈ Sd

Cn, its symmetric rank RS(T ) and tensor rank R(T ) are equal.

This far the conjecture has been proved true for R(T ) = 1, 2, R(T ) ≤ n and for sufficiently large d with respect to n [10], and for tensors of border rank two [3]. Furthermore during the AIM workshop D. Gross showed that the conjecture is also true when R(T ) ≤ R(Tk,d−k) if k < d/2, here Tk,d−k is a way to view T

as a second order tensor, i.e., Tk,d−k∈ SkV ⊗ Sd−kV .

2.1.3 Kruskal rank

The Kruskal rank is named after Joseph B. Kruskal and is also called k-rank. For a matrix A the k-rank is the largest number, κA, such that any κAcolumns

of A are linearly independent. Let T = PR

r=1ar⊗ br⊗ cr and let A, B and

C denote the matrices with a1, . . . , aR, b1, . . . , bR and c1, . . . , cR as column

vectors, respectively. Then the k-rank of T is the tuple (κA, κB, κC) of the

k-ranks of the matrices A, B, C.

With the k-rank of T Kruskal showed that the condition κA+ κB+ κC≥ 2R(T ) + 2

is sufficient for T to have a unique, up to trivialities, CP-decomposition ([22]). This result has been generalized in [35] to order d tensors as

d

X

i=1

κAi≥ 2R(T ) + d − 1, (2.2)

(39)

2.2. Varieties of matrices over C 23

2.1.4 Multilinear rank

A reason why the multilinear rank is of interest for the tensor rank is that it can be used to set up lower bounds for the tensor rank.

To find some lower bound we recall the definition of multilinear rank of a tensor T ; the k-tuple (dim T (V₁∗), . . . , dim T (V_k∗)). Here T (V_i∗) is the image of the linear operator V_i∗→ V1⊗ · · · ⊗ Vi−1⊗ Vi+1⊗ · · · ⊗ Vk. From linear algebra

we know that the rank of a linear operator is at most the same dimension as the domain, dim(Vi), or at most as the dimension of the codomain, which is

Qk

j=1,j6=idim(Vj). Since elements of T (V ∗

i ) can be seen as elements of V1⊗ . . . ⊗

Vk with elements of Vi fixed, dim T (Vi∗) will be at most R(T ). Therefore

dim(T (V_i∗)) ≤ min{R(T ), dim(Vi), k

Y

j=1,j6=i

dim(Vj)},

which can be interpreted as

R(T ) ≥ max{dim(T (V₁∗)), . . . , dim(T (V_k∗))}.

2.2 _{Varieties of matrices over C}

As a warm-up for what is to come, we will consider the two-factor Segre variety (example 1.3.18) and its secant varieties. The Segre variety Seg(PV × PW ) corresponds to matrices (of a given size) of rank one and the secant variety (definition 1.3.21) σr(Seg(PV × PW )) to matrices of rank ≤ r.

Let V and W be vector spaces of dimension nV and nW respectively. Thus

the space V ⊗ W has dimension nVnW and PV, PW, P(V ⊗W ) have dimensions

nV − 1, nW − 1, nVnW − 1. The Segre map Seg : PV × PW → P(V ⊗ W )

embeds the whole space PV × PW in P(V ⊗ W ). Thus the two-factor Segre variety, which can be interpreted as the projectivization of the set of matrices with rank one, has dimension nV + nW − 2.

As noted in chapter 1, the expected dimension of the secant variety σr(X)

where dim(X) = k is min{rk + r − 1, n} where n is the dimension of the ambient space. Thus, the expected dimension of σ2(Seg(P3C ⊗ P3C)) is min{2 · 4 + 2 − 1, 8) = 8, so if the dimension would have been the expected, the rank two matrices would have filled out the whole space. However, we know that this is not true, since there are 3 × 3 matrices of rank three. Therefore σ2(Seg(P3C ⊗ P3C)) must be degenerate. We want to find the defects of the secant varieties σr(PV × PW ).

From the definition of dimension of a variety (definition 1.3.15) it is enough to consider the dimension of the affine tangent space to a smooth point in σr(PV × PW ). Choose bases for V and W and consider the column vectors

xi=    x1_i .. . xnV i   

for i = 1, . . . , r. Construct the matrix

M =  x1 x2 · · · xr P r i=1c r+1 i xi · · · P r i=1c nW i xi  

(40)

so rank(M ) = r. The xi and the cli are parameters, which gives a total of

rnV + r(nW− r) = r(nV+ nW− r) parameters. Thus dim σr(Seg(PV × PW )) =

r(nV + nW − r) − 1 and the defect is

δr= r(nV + nW − 2) + r − 1 − r(nV + nW − r) + 1 = r2− r.

One can see that inserting r = nV or r = nW in the expression for dim σr(Seg(PV ×

PW )) yields nVnW − 1 = dim P(V ⊗ W ), the conclusion being the well-known

result that the maximal rank for linear maps V → W is min{nV, nW}.

However, there is another way to arrive at the formula for the dimension of dim σr(PV ×PW ). For this we use lemma 1.3.25. So, let [p] ∈ σr(Seg(PV ×PW ))

be a general point, we can take p = v1⊗ w1+ · · · + vr⊗ wr, where the vi and

wi are linearly independent. Thus, by Terracini’s lemma:

ˆ

Tpσr= V ⊗ span(w1, . . . , wr) + span(v1, . . . , vr) ⊗ W

the two spaces have r2_{-dimensional intersection span(v}

1, . . . , vr)⊗span(w1, . . . , wr),

so the tangent space has dimension rnV+rnW−r2and dim σr(Seg(PV ×PW )) =

r(nV + nW − r) − 1.

2.3 _{Varieties of tensors over C}

In this section we will provide equations for the three-factor Segre variety and show some basic results on generic ranks. Note that it is not possible to have an algebraic set which contains only the tensors of rank R or less, it must contain the tensors of border rank R or less: Assume that p is a polynomial such that every tensor of rank R or less is a zero of p and that T is a tensor with border rank R (or less) but with a rank greater than R. Now, let Ti be a sequence of

tensors of rank R (or less) such that Ti→ T . Since polynomials are continuous

we get p(T ) = limi→∞p(Ti) = 0.

One can also note that there is only one tensor of rank zero, namely the zero tensor.

2.3.1 Equations for the variety of tensors of rank one

The easiest case (not counting the case of rank zero tensors), is the tensors of rank one, which are rather well-behaved.

Lemma 2.3.1. Let T be a third order tensor. Assuming T is not the zero tensor, T has rank one if and only if the first non-zero slice has rank one and all the other slices are multiples of the first.

Proof. Special case of theorem 2.1.2.

Theorem 2.3.2. An I × J × K tensor T with elements xi,j,khas rank less than

or equal to one if and only if

xi1,j1,k1xi2,j2,k2− xl1,m1,n1xl2,m2,n2= 0 (2.3)

for all i1, i2, j1, j2, k1, k2, l1, l2, m1, m2, n1, n2 where {i1, i2} = {l1, l2}, {j1, j2} =

(41)

2.3. Varieties of tensors over C 25

Proof. Assume T has rank one, i.e. T = a ⊗ b ⊗ c. Then xi,j,k= aibjck which

makes (2.3) true.

Conversely, assume (2.3) is satisfied. Fixing i1 = i2 = 1 one gets the 2 × 2

minors for the first slice T1of T , which implies that T1has rank (at most) one.

Assume without loss of generality that T is not the zero tensor and that T1 is

non-zero, and especially x1116= 0. Take i1= j1= k1= 1 in (2.3) and one gets

x1,1,1xk,i,j= x1,i,jxk,1,1⇐⇒ xk,i,j=

xk,1,1

x1,1,1

| {z }

:=αk

x1,i,j

and since αk is only dependent on which slice one picked, k, this shows that

all slices are multiples of the first slice. By lemma 2.3.1 this is equivalent to T having rank one.

In other words, (2.3) cuts out the three factor-Segre variety set-theoretically. Theorem 2.3.3. A tensor has border rank one if and only if it has rank one. Proof. The Segre variety consists of the projectivization of all tensors of rank one. Since the Segre variety is an algebraic set, there exists an ideal P of polynomials such that the Segre variety is V(P ). If (a projectivization of) a tensor has border rank one, it too has to be a zero of P and is therefore an element in the Segre variety, and thus has rank one.

2.3.2 Varieties of higher ranks

Let X = Seg(PA × PB × PC)) where A, B, C are complex vector spaces, so X is the projective variety of tensors of rank one. We can now form varieties of tensors of higher border ranks by forming secant varieties. The secant variety σr(X) will be all tensors of border rank r or less. By theorem 1.3.22 the secant

varieties will be irreducible since X is irreducible.

Consider the r:th secant variety of the Segre variety, σr(Seg(PA × PB ×

PC)), where dim A = nA, dim B = nB, dim C = nC and assume that r ≤

min{nA, nB, nC}. A general point [p] in the secant variety can then be written

[p] = [a1⊗ b1⊗ c1+ a2⊗ b2⊗ c2+ · · · + ar⊗ br⊗ cr]

and by Terracini’s lemma (lemma 1.3.25), with X = Seg(PA × PB × PC): ˆ

T[p]σr(X) = ˆT[a1⊗b1⊗c1]X + · · · + ˆT[ar⊗br⊗cr]X =

= a1⊗ b1⊗ C + a1⊗ B ⊗ c1+ A ⊗ b1⊗ c1

+ · · · + ar⊗ br⊗ C + ar⊗ B ⊗ cr+ A ⊗ br⊗ cr.

The spaces ai⊗ bi⊗ C, ai⊗ B ⊗ ci, A ⊗ bi⊗ ci share the one-dimensional

space spanned by ai⊗ bi⊗ ciand thus ai⊗ bi⊗ C + ai⊗ B ⊗ ci+ A ⊗ bi⊗ ci

has dimension nA+ nB+ nC− 2. It follows that dim σr(Seg(PA × PB × PC)) =

r(nA+ nB+ nC− 2) − 1 which is the expected dimension. We have proved the

following:

Theorem 2.3.4. The secant variety σr(Seg(PA × PB × PC)) has the expected

(42)

Corollary 2.3.5. The generic rank for tensors in C2_{⊗ C}2_{⊗ C}2 _{is 2.}

Theorem 2.3.6. The generic rank in A ⊗ B ⊗ C is greater than or equal to nAnBnC

nA+ nB+ nC− 2

.

Proof. Let X = Seg(PA × PB × PC) so dim X = nA+ nB + nC − 2. The

expected dimension of σr(Seg(PA × PB × PC)) is r(nA+ nB+ nC− 2) − 1. If

r is the generic rank the dimension of the secant variety is nAnBnC− 1. Thus,

r(nA+ nB+ nC− 2) − 1 ≥ nAnBnC− 1 which implies

r ≥ nAnBnC nA+ nB+ nC− 2

.

From theorem 2.3.6 we see that the generic rank for a tensor in Cn_{⊗ C}n_{⊗ C}2

is at least n. We can also see that if σn(X) is not degenerate, n is the generic

rank. This is actually the case.

Theorem 2.3.7 (Generic rank of quadratic two slice-tensors.). The generic rank in Cn

⊗ Cn

⊗ C2 _{is n.}

Proof. With the same notation as the rest of this section, we show that σn(X) ⊂

Cn⊗ Cn⊗ C2. A general point [p] ∈ σn(X) is given by

[p] = " n X i=1 ai⊗ bi⊗ ci #

where {ai}ni=1, {bi}ni=1 are bases for Cn and ci ∈ C2. By Terracini’s lemma:

ˆ T[p]σn(X) = n X i=1 Cn⊗ bi⊗ ci+ ai⊗ Cn⊗ ci+ ai⊗ bi⊗ C2

where the spaces Cn⊗ bi⊗ ci, ai⊗ Cn⊗ ci, ai⊗ bi⊗ C2have a one-dimensional

intersection ai⊗ bi⊗ ci, so the dimension is n(n + n + 2 − 2) − 1 = 2n2− 1 =

dim P(Cn⊗ Cn

⊗ C2_{), so the secant variety is not degenerate and the generic}

rank is n.

Theorem 2.3.8 (Generic rank of cubic tensors.). The generic rank in Cn⊗ Cn⊗ Cn for n 6= 3 is

_n3

3n − 2

. In C3_{⊗ C}3_{⊗ C}3 _{the generic rank is 5.}

(43)

2.4. Real tensors 27

2.4 Real tensors

The case of tensors over R is more complicated than the case of tensors over C. For example, there is not necessarily one single generic rank, but there can be several ranks for which the tensors with these ranks have positive measure, for any measure compatible with the Euclidean structure of the space, e.g. the Lebesgue measure. Such ranks are called typical ranks. For instance, a randomly picked tensor in R2_{⊗ R}2_{⊗ R}2_{, the elements taken from a standard distribution,}

has rank two with probability π₄ and rank three with probability 1 −π₄ [4, 5]. We state a theorem describing the situation. First, define the mapping

fk : (Cn1× Cn2× Cn3)k→ Cn1⊗ Cn2⊗ Cn3 fk(a1, b1, c1, . . . , ak, bk, ck) = k X r=1 ar⊗ br⊗ cr

thus, f1 is the Segre mapping.

Theorem 2.4.1. The space Rn1⊗ Rn2⊗ Rn3 _{contains a finite number of open}

connected semi-algebraic sets O1, . . . , Omsatisfying:

1. Rn1_{⊗ R}n2_{⊗ R}n3_\Sm

i=1Oi is a closed semi-algebraic set whose dimension

is strictly smaller than n1n2n3.

2. For i = 1, . . . , m there is an ri such that ∀T ∈ Oi, the rank of T is ri.

3. The minimum rmin of all the ri is the generic rank in Cn1⊗ Cn2⊗ Cn3.

4. The maximum rmax of all the ri is the minimal k such that the closure of

fk((Rn1× Rn2× Rn3)k) is Rn1⊗ Rn2⊗ Rn3.

5. For every integer r between rminand rmaxthere exists a risuch that ri= r.

Proof. See [15].

The integers rmin, . . . , rmaxare the typical ranks, so the theorem states that

(44)

(45)

Chapter 3

Numerical methods and

results

In this chapter we present three numerical methods for determining typical ranks of tensors.

When one receives data from any kind of measurement it will always contain some random noise, i.e., a tensor received from measurements can be seen as having a random part. We know that a random matrix has the maximum rank with probability one, but for random higher order tensors the rank will be a typical rank. Therefore if one knows the typical ranks of a type of tensor one has just a few alternatives for its rank to explore in order to calculate a decomposition.

The space of 3 × 3 × 4 tensors is the smallest tensor space where it is still unknown if there is more than one typical rank. Therefore this space has been used as a test object.

3.1 Comon, Ten Berge, Lathauwer and Castaing’s

method

The method to calculate the generic rank or the smallest typical rank of tensor spaces described in this section was presented in [11]. The method uses the fact that the set of tensors of border rank at least R, denoted σR, is an irreducible

variety. Therefore by definition 1.3.15, dim(σR) = dim( ˆTxσR) − 1 for smooth

points, x in σR. Since σR is smooth almost everywhere x can be generated

randomly in σR. The generic rank is the first rank where dim(σR) is equal to

the dimension of the ambient space.

To calculate dim( ˆTxσR) we need ψ, which is a map from a given set of vectors

{u(l)r ∈ FNl, 1 ≤ l ≤ L, 1 ≤ r} onto FN1⊗ · · · ⊗ FNL as: {u(l) r ∈ F Nl_, _{1 ≤ l ≤ L,} _{1 ≤ r ≤ R} 7→} R X r=1 u(1)_r ⊗ u(2) r ⊗ . . . ⊗ u (L) r .

Then the dimension D of the closure of the image of ψ is dim( ˆTxσR), which can

be calculated as the rank of the Jacobian, JR, of ψ, expressed in any basis. This

(46)

gives us the lowest typical rank (or generic rank) as the last R for which the rank of the Jacobian matrix increases. The algorithm to use in practice is algorithm 2, seen below. To construct JR one needs to know the size of the tensors and if

the tensors have any restrictions on them, one example of restriction being that the tensors are symmetric (see matrix (3.4) for the symmetric restriction and (3.1) for the case of no restriction).

To be able to write down Ji in a fairly simple way we need the Kronecker

product. Given two matrices A of size I × J and B of size K × L the Kronecker product A ~ B is the IK × JL matrix defined as

A ~ B =    a11B . . . a1JB .. . . .. ... aI1B . . . aIJB   .

For a third order tensor with no restriction, such as symmetry or zero-mean of the vectors, ψ is the map

{ar∈ FN1, br∈ FN2, cr∈ FN3, r = 1, . . . , R} 7→ R

X

r=1

ar⊗ br⊗ cr.

In a canonical basis, elements of im(ψ), say T , has the coordinate vector

T =

R

X

r=1

ar~ br~ cr,

where ar, brand cr are row vectors. Let us have an example to illustrate how

the Jacobian matrix is constructed.

Example 3.1.1 (Jacobian matrix). Let T be a 2 × 2 × 2 tensor. Then the coordinates of T in a canonical basis are:

T =   R(T ) X r=1 ar(1)br(1)cr(1), R(T ) X r=1 ar(1)br(1)cr(2), R(T ) X r=1 ar(1)br(2)cr(1), R(T ) X r=1 ar(1)br(2)cr(2), R(T ) X r=1 ar(2)br(1)cr(1), R(T ) X r=1 ar(2)br(1)cr(2), R(T ) X r=1 ar(2)br(2)cr(1), R(T ) X r=1 ar(2)br(2)cr(2)  ,

(47)

3.1. Comon, Ten Berge, Lathauwer and Castaing’s method 31

component then the Jacobian matrix is

J =                            ∂T1 ∂a1(1) ∂T2 ∂a1(1) · · · ∂T8 ∂a1(1) ∂T1 ∂a1(2) ∂T2 ∂a1(2) · · · ∂T8 ∂a1(2) ∂T1 ∂b1(1) ∂T2 ∂b1(1) · · · ∂T8 ∂b1(1) ∂T1 ∂b1(2) ∂T2 ∂b1(2) · · · ∂T8 ∂b1(2) ∂T1 ∂c1(1) ∂T2 ∂c1(1) · · · ∂T8 ∂c1(1) ∂T1 ∂c1(2) ∂T2 ∂c1(2) · · · ∂T8 ∂c1(2) .. . ... ... ... ∂T1 ∂ar(1) ∂T2 ∂ar(1) · · · ∂T8 ∂ar(1) ∂T1 ∂ar(2) ∂T2 ∂ar(2) · · · ∂T8 ∂ar(2) ∂T1 ∂br(1) ∂T2 ∂br(1) · · · ∂T8 ∂br(1) ∂T1 ∂br(2) ∂T2 ∂br(2) · · · ∂T8 ∂br(2) ∂T1 ∂cr(1) ∂T2 ∂cr(1) · · · ∂T8 ∂cr(1) ∂T1 ∂cr(2) ∂T2 ∂cr(2) · · · ∂T8 ∂cr(2)                            , where ∂T1 ∂a1(1) = ∂PR r=1a1(1)b1(1)c1(1) ∂a1(1) = b1(1)c1(1).

In the more general case ar, br and cr are row vectors of lengths N1, N2

and N3 respectively. The Jacobian of ψ is after R iterations the following

(N1+ N2+ N3)R × N1N2N3 matrix: J =                     IN1 ~ b1 ~ c1 a1 ~ IN2 ~ c1 a1 ~ b1 ~ IN3 .. . IN1 ~ bi ~ ci ai ~ IN2 ~ ci ai ~ bi ~ IN3 .. . IN1 ~ bR ~ cR aR ~ IN2 ~ cR aR ~ bR ~ IN3                     . (3.1)

It is also possible to calculate the generic rank of tensors with more restric-tions using this algorithm, such as symmetric tensors (3.2) or tensors that are symmetric in one slice (3.3).

R X r=1 ar~ ar~ ar, (3.2) R X r=1 ar~ ar~ br. (3.3)

The symmetric tensor (3.2) gives rise to the following Jacobian matrix:

J =    IN~ a1~ a1 + a1~ IN ~ a1 + a1~ a1~ IN .. . IN~ ar~ ar + ar~ IN ~ ar + ar~ ar~ IN   . (3.4)

Tensor Rank

Examensarbete

Tensor Rank

Elias Erdtman, Carl J¨

onsson

Tensor Rank

Abstract

Preface

Background

Purpose

Chapter outline

Distribution of work

Nomenclature

Symbols

Contents

List of Tables

List of Figures

Chapter 1

Introduction

1.1

Multilinear algebra

1.1.1

Tensor products and multilinear maps

1.1.2

Symmetric and skew-symmetric tensors

1.1.3

GL(V

) × · · · × GL(V

) acts on V

⊗ · · · ⊗ V

1.2

Tensor decomposition

1.3

Algebraic geometry

1.3.1

Basic definitions

1.3.2

Varieties and ideals

1.3.3

Projective spaces and varieties

1.3.4

Dimension of an algebraic set

1.3.5

Cones, joins, and secant varieties

1.3.6

Real algebraic geometry

1.4

Application to matrix multiplication

Chapter 2

Tensor rank

2.1

Different notions of rank

2.1.1

Results on tensor rank

2.1.2

Symmetric tensor rank

2.1.3

Kruskal rank

2.1.4

Multilinear rank

2.2

Varieties of matrices over C

2.3

Varieties of tensors over C

2.3.1

Equations for the variety of tensors of rank one

2.3.2

Varieties of higher ranks

2.4

Real tensors

Chapter 3

Numerical methods and

results

3.1

Comon, Ten Berge, Lathauwer and Castaing’s

method

_{Varieties of matrices over C}

_{Varieties of tensors over C}