Speeding up PARAFAC: Approximation of tensor rank using the Tucker core

(1)

Speeding up PARAFAC

Approximation of tensor rank using the Tucker core

Lukas Arnroth

Abstract

In this paper, the approach of utilizing the core tensor from the Tucker decomposition, in place of the uncompressed tensor, for finding a valid tensor rank for the PARAFAC decomposition is considered. Validity of the proposed method is investigated in terms of error and time consumption. As the solutions of the PARAFAC decomposition are unique, stability of the solutions through split-half analysis is investigated. Simulated and real data are considered. Although, no general validity of the method could be observed, the results for some datasets look promising with 10% compression in all modes. It is also shown that increased compression does not necessarily imply less time consumption.

Key words: Tucker decomposition; PARAFAC; tensor rank; split-half analysis

The Department of Statistics

Uppsala University

Supervisor: Rauf Ahmad

(2)

4.3 Proposed method . . . 13 4.4 Evaluation metric . . . 14 4.5 Stability analysis . . . 14 5 Data 14 5.1 Simulation . . . 14 5.2 Real data . . . 15 5.2.1 Handwritten digits . . . 15 5.2.2 Image data . . . 16 6 Results 17 6.1 Simulations . . . 17 6.2 Real data . . . 24 6.3 Stability analysis . . . 26 7 Conclusions 28 8 References 30

(3)

1 Introduction

The onset of big data does not only include dealing with vast quantities of data, but also the notion of complexity in terms of higher order arrays, or tensors. Tensors are generalizations of vectors and matrices to arrays of higher order, or multiple modes, where elements have an index for each mode [21]. A tensor, in traditional statistical analysis, can arise when the (observation × variable) matrix is replicated for different timepoints or locations, rendering a cube of data. A tensor can in that sense be viewed as variables measured in a crossed fashion [11], or as a representation of multiple interactions. Trimodal data does appear frequently but might not be recognized as such due, probably, to lack of awareness [12]. Upon realizing that data might better be described in a multimodal fashion, new methods become applicable, that might more adequately describe the complex reality.

The most commonly used methods are the PARAFAC (PARAllel FAC tors) and the Tucker decomposition [7, 30]. These methods summarise data by components, and represent, to different extents, multimodal extensions of principal component analysis (PCA) [18]. The reasons for not matricizing tensors and using PCA are interpretability of the solutions and excess degrees of freedom used [11]. In other words, there are advantages of maintaining the inherent structure of multimodal data, rather than collapsing some mode of the array to use multivariate techniques.

The Tucker decomposition is the tensor based method most closely related to PCA [7], where a tensor is decomposed into a core tensor and a factor matrix for each mode of the original tensor. The core tensor can, to some extent, be viewed as a compressed version of the original tensor [7, 16]. To what extent the core tensor can be treated as the original tensor is of practical interest, as it reduces the operations required for the PARAFAC decomposition. Most of the algorithms for fitting the PARAFAC decomposition are based on alternating least squares (ALS), which can be time consuming [11, 30]. Compression is thus relevant when applying the PARAFAC decomposition to larger datasets.

The PARAFAC decomposition is similar to the Tucker decomposition, but differs in that restric-tions are imposed on the core tensor. The imposed restricrestric-tions lead to unique solurestric-tions under mild conditions [47]. Apart from the aforementioned benefits of taking the structure of the data into account, the uniqueness of the optimal solutions is a contributing factor to the increasing interest in tensor methods [13, 44]. Not suffering from rotational indeterminancy makes tensor based methods especially attractive in the context of extracting a priori unknown compenents in multimodal data [32]. The restrictions of the PARAFAC decomposition makes representation of the results as a sum of rank-one tensors1 more natural, where each rank-one tensor can be interpreted as seperate multimodal components of the tensor [3].

(4)

1.1 Literature review

Tensor decompositions originated with the work of Hitchcock in 1927, more specifically on decom-posing a tensor into a sum of products [23]. Hitchcock’s work recieved little attention until the 1960s and 1970s when tensor decompositions started to become extensively used in the field of psycho-metrics [7]. The Tucker decomposition was developed in the 1960s [51], followed by the PARAFAC (PARAllel FAC tors) decomposition in the 1970s [15, 22]. It should be noted that another name for the PARAFAC decomposition in the literature is CANDECOMP (CAN onical DECOMP osition) [16], however PARAFAC have become more prevalent [7, 30].

Tensor decompositions has since gained much attention outside its originating domain, in fields where the tensor naturally arises such as neuroscience [2, 9, 26], image processing [19, 36, 45], recommender systems [39, 49], computer vision [1, 14, 20] and bioinformatics [35, 41].

The Tucker decomposition has similar applications as PCA in the context of tensors [7]. It is utilized for dimension reduction, often referred to as compression [4, 34, 52], and extracting underlying latent factors [5, 18]. This latter usage comes from the rotational freedom of the Tucker decomposition [28], which has been utilized for more specialized contexts, such as anomaly detection [56]. The uniqueness of the solutions of the PARAFAC decomposition has made it popular in more specific contexts. Examples are blind source separation, such as identifying regions in the brain that are activated when performing a certain task [2, 50] or analysing signals in sound sources [40] and fluorescence data [3]. Uniqueness has made stability analysis a natural validation tool of the obtained solution [3, 18]. One commonly used method is the split-half analysis where the tensor is split into independent halves and the PARAFAC estimates are compared for the two halves [53].

There is not much work concerning the relationship between the Tucker and the PARAFAC decompositions. In most cases the Tucker decomposition is used as a preprocessing technique. The PARAFAC decomposition is subsequently applied to the core tensor, upon which the estimated core replaces the original core tensor. The PARAFAC decomposition is in other words nested. One such procedure is referred to as the CANDELINC decomposition [16] and another is the two-level rank decomposition [27]. These procedures have the problem of algorithmic demands on the user and are not easily applicable. Generally the solution needs to be tuned with further iterations after getting the results [11]. There is also an example of using the Tucker decomposition on data with high multicollinearity. The rotational freedom is then utilized to absorb the multicollinearity into the factor matrices, leading to faster estimation using the PARAFAC decomposition [29]. One exception to the preprocessing type usage of the Tucker decomposition is the CORCONDIA algorithm for deciding the optimal number of rank-one tensors, or tensor rank, for the PARAFAC decomposition. The CORCONDIA algorithm is however fundamentally meant for testing the validity of the restrictions of the PARAFAC decomposition, meaning that a good rank will be found given that the restrictions are feasible [13]. CORCONDIA has been used with success in simulation studies [42] and applied work [8, 38].

In a recent publication, some theoretical relationships between the data and its core tensor are established [25]. The main result is a formal proof of the CANDELINC theorem [16]: that the

(5)

tensor ranks of the uncompressed data and its Tucker core are equal. This implies that the core tensor is a valid substitute of the original tensor in the PARAFAC decomposition when using the true tensor rank. The research field of the direct relationships between a tensor and its Tucker core is otherwise lacking, making the analysis of their relationship important.

1.2 Research question

The methods nesting the PARAFAC decomposition in the Tucker decomposition are either specific in motivation or algorithmically demanding. Furthermore there seems to be a lack of research on the viability of utilizing the Tucker core in the PARAFAC decompositon for tensor ranks below the true rank. This is a gap in the literature as the true tensor rank is seldom known in applied settings. Furthermore, utilizing the core tensor in a more straight forward way, treating it as the original tensor, would leads to more easily implemented algorithms. The purpose of this paper is thus to investigate the viability of utilizing the Tucker core, rather than the whole data, to estimate the model fit for different tensor ranks. Investigation will be extended to different levels of compression. Viability will be judged both on precision and time consumption in estimation. Furthermore the stability of the procedure will be considered, both in terms of model fit and stability of estimates from the PARAFAC decomposition on the core tensor.

One restriction made in this paper is to exclude comparison with the CONCORDIA algorithm, as the method developed in this paper does not aim to test validity of the restrictions of the PARAFAC decomposition.

1.3 Outline of paper

Moving forward, the necessary preliminaries of tensor decompositions will be considered in Chap. 2. Then a theoretical description of the Tucker and the PARAFAC decompositions will be presented in Sec. 3.1 and 3.2 respectively, followed by the algorithms used in this paper in Chap. 4. The method proposed in this paper is outlined in Sec. 4.3. Data used is presented in Chap. 5. The results are presented in Chap. 6 followed by the conclusions in Chap. 7.

2 Preliminaries

This thesis adopts the notation of Kolda and Bader [6, 7, 30]. Scalars will be reffered to by lowercase letters, such as a. Vectors will be referred to by lowercase boldface letters, such as a. Matrices will be referred to by uppercase boldface letters, such as A. Tensors will be referred to by boldface Euler script letters, such as X . The number of modes of a tensor will be referred to as order. Fixed indicies will be denoted by lowercase letters and assumed to go from 1 to upper case letters, e.g. the element (i, j, k) of X will be denoted xi,j,k where i = 1, ..., I. A non-fixed dimension will be

(6)

denoted by I. For I ∈ RI1×I2×...×IN_{, values in entries i}

1 = i2 = . . . = iN are ones and otherwise

zero [7].

The Kronecker product between matrices A ∈ RI×K _{and B ∈ R}J ×L _{is denoted by A ⊗ B and}

the result is of dimension (IJ ) × (KL).

The Khatri-Rao product [43] between matrices A ∈ RI×K and B ∈ RJ ×K is denoted by A B, which is the columnwise Kronecker product

A B = [a1⊗ b1⊗ . . . ⊗ aK⊗ bK].

The resulting matrix is of dimension (IJ ) × K. The Khatri-Rao product will be essential in the PARAFAC decomposition. Note also that it can only be defined for matrices with the same number of columns.

The Hadamard product [10] of A ∈ RI×K _{and B ∈ R}I×K _{is denoted by A ∗ B, which is the}

elementwise product. The resulting matrix is of dimension I × K.

The outer product of vectors a ∈ RI and b ∈ RK will be denoted by a ◦ b with a resulting matrix of dimension I × K. This can be extended to include c ∈ RJ _{with the result of a ◦ b ◦ c}

being a tensor of order three with dimensions I × K × J . Letting a(n)∈ RIn_{, the outer product can}

be extended to a tuple of vectors of N elements [30]

a(1)◦ . . . ◦ a(N) _{= Y, Y ∈ R}I1×I2×...×IN_. ₍₁₎

Elementwise, (1) can be expressed as yi1,...,iN = a

(1) i1 a (2) i2 . . . a (N ) iN , ∀i1, . . . iN [52].

Fixing all but the nth index of a tensor X ∈ RI1×I2×...×IN _{gives mode-n fibers, higher order}

analogues of matrix rows and columns [7]. Letting A ∈ RI×J ×K, the mode-n fibers are displayed in Fig. 1. Fig. 1a shows the column fibers, obtained by fixing all indices but the first. Fig. 1b shows the row fibers, obtained by fixing all indices but the second. Fig. 1c shows the tube fibers, obtained by fixing all indices but the third.

(a) Mode 1: a:jk. (b) Mode 2: ai:k. (c) Mode 3: aij:.

Figure 1: Fibers of tensor of order 3.

(7)

Unfolding a tensor along the nth mode maps the tensor to a matrix by taking the mode-n-fibers and arranging them as the columns of a matrix [44]. Henceforth the mode-n unfolding of a tensor will be denoted by subscript in parentheses, e.g. A_(n). The mode-1 unfolding of A ∈ RI×J ×K _{would give}

a matrix of dimension I × (J K) with columns [a:11, a:12, . . . , a:1K, . . . , a:J K]. For a general tensor of

order N, X ∈ RI1×I2×...×IN_{, the mode-n unfolding would give X}

(n)∈ RIn×I1I2...In−1In+1...IN [7, 33].

The refolding of a matrix maps it back to a tensor.

The n-mode unfolding of the super-diagonal tensor I ∈ RI1×I2×...×IN _{gives a matrix with}

dimensions In× (I1. . . In−1In+1. . . IN) with ones on the main diagonal.

The n-mode product of a tensor X ∈ RI1×I2×...×IN _{by a matrix U ∈ R}Jn×In _{is denoted by X ×}

n

U = Y where Y ∈ RI1×I2×...×In−1×Jn×In+1×...×IN _{[37]. Each mode-n fiber is in the process multiplied}

by the matrix U, meaning that the n-mode product can be expressed in terms of unfolding and refolding the tensor [7]

X ×nU = Y ⇔ Y(n)= UX(n). (2)

The resulting dimension of UX_(n) in (2) is Jn× (I1I2. . . In−1In+1. . . IN) [33]. To get Y in (2)

UX(n)is refolded.

The Tucker operator is defined to simplify and standardize the often clumsy notation of tensor algebra used for tensor decompositions [30]. Letting X ∈ RI×J ×K_{, the Tucker operator is defined}

as

X =_{JG ; A, B, CK = G ×}1A ×2B ×3C. (3)

The decomposition of X in (3) will be useful for the Tucker decomposition. G ∈ RR×P ×L is the core tensor, where R ≤ I, P ≤ J , L ≤ K and A ∈ RI×R, B ∈ RJ ×P and C ∈ RK×L are the factor matrices.

For the PARAFAC decomposition, the Tucker operator is redefined as [30]

X =_{JI ; A, B, CK = I ×}1A ×2B ×3C, (4)

where A ∈ RI×R_{, B ∈ R}J ×R _{and A ∈ R}K×R_{, R is also referred to as tensor rank [6]. In the}

following, (4) will be referred to as _{JA, B, CK to make a distinction between the Tucker and the} PARAFAC decompositions more clear.

The inner product between tensors X , Y ∈ RI1,I2,...,IN _{can be expressed using vectorization [7]}

kX , Yk2_F = hX , Yi = hX(n), Y(n)i = hvec(X(n)), vec(Y(n))i = vec(X_(n))Tvec(Y_(n)) = I1 X i1=1 I2 X i2=1 . . . IN X iN=1 xi1i2...iNyi1i2...iN.

(8)

tensors is [30]

kX − Yk2_F = kX k2_F + kYk2_F − 2 kX , Yk2_F, (5) where kX k2_F = hX , X i. Equality (5) also holds for matrices and vectors, as these are special cases of tensors.

3 Theoretical background

3.1 The Tucker decomposition

The Tucker decomposition decomposes a tensor into a core tensor multiplied by a projection matrix for each mode [34]. It is referred to as N -mode PCA in the sense that it is a change of basis through which a tensor can be compressed [7, 54]. The representation of a tensor of order three, as in (3), is an exact equality. It is only a change of basis. However, it is generally the aim to express the data sufficiently with a data amount lesser than that contained in the original tensor [51]. In Fig. 2 the Tucker decomposition is illustrated with the dashed lines representing compression of X . The

Figure 2: Tucker decomposition of X ∈ RI×J ×K_.

Tucker decomposition is mathematically represented as (3) when the dimensions of the core is the same as that of the original tensor, as shown by the solid lines in Fig. 2. For the general case, a tensor of order N , Y ∈ RI1,I2,...,IN_{, (3) is expressed as}

Y ≈_{JG ; A}(1), A(2), . . . , A(N)_{K = G}

N

Y

n=1

×nA(n). (6)

In estimation it will be useful to express (6) in matricized form [4, 24] Y_(n)≈ A(n)_G

(n) A(N )⊗ . . . ⊗ A(n+1)⊗ A(n−1)⊗ . . . ⊗ A(1)

T

(9)

For notational convenience the sequential Kronecker product in (7) will be referenced by O

i6=n

A(n)= A(N )⊗ . . . ⊗ A(n+1)⊗ A(n−1)⊗ . . . ⊗ A(1). Thus (7) can be expressed as

Y ≈ A(n)G_(n) O

n6=k

A(n)T.

For the third order tensor, as shown in Fig. 2, the matricized representation of (3) in mode 1 is expressed as

X(1)≈ AG(1) C ⊗ B

T

. (8)

X is then reconstructed by refolding X₍₁₎ in (8).

In line with previous works, the factor matrices will be fitted to be orthonormal [4, 7, 25, 27]. The Tucker decomposition is often referred to as independent Tucker decomposition if the factor matrices are orthonormal with full column rank [25].

3.2 The PARAFAC decomposition

In (4) the PARAFAC decomposition was notationally specified as the Tucker decomposition with a super-diagonal core tensor. The PARAFAC can in that sense be seen as a Tucker decomposition with restrictions imposed on the core [25]. However, it is generally specified as the sum of rank-one-tensors, where a rank-one-tensor of order N is the outer product of N vectors [7]. For a tensor of order three, X ∈ RI×J ×K, with factor matrices A ∈ RI×R, B ∈ RJ ×R and C ∈ RK×R, the PARAFAC decomposition is defined as [11, 48]

X =_{JA, B, CK =}

R

X

r=1

a:r◦ b:r◦ c:r. (9)

The formulation as in (9) can also be expressed elementwise as

xijk = R

X

r=1

airbjrckr.

R in (9) is the tensor rank of X , denoted rank(X ). Rank(X ) is thus the minimum number of rank-one tensors needed to generate X [31]. Estimating the tensor rank is seldom of interest in applied work, but rather to find a sufficient number of rank-one tensors to approximate the tensor. In figure 3 the PARAFAC decomposition in (9) is illustrated with P < R rank-one tensors. The

(10)

Figure 3: PARAFAC decomposition of X ∈ RI×J ×K _{with P < rank(X ).}

PARAFAC decomposition in (9) is readily extended to the N-order tensor. For Y ∈ RI1,I2,...,IN

Y =_JA(1), A(2), . . . , A(N)_{K =}

R

X

r=1

a(1)_:r ◦ a(2)_:r ◦ . . . ◦ a(N )_:r . (10) As with the Tucker decomposition, (10) can be expressed in matricized form [30]

Y(n)= A(n)I(n) A(N ) . . . A(n+1) A(n−1) . . . A(1)

T

. (11)

For notational convenience the sequence of Khatri-Rao products in (11) will be expressed as K

i6=n

A(i)= A(N ) . . . A(n+1) A(n−1) . . . A(1). Thus (11) can be expressed as

Y_(n)= A(n) K

i6=n

A(i)T.

For the third order tensor in Fig. 3, the mode-1 unfolded tensor can be expressed as X(1) ≈ A C B

T

, (12)

upon which the tensor is reconstructed by folding X₍₁₎.

3.3 Relationship between a tensor and its Tucker core

For any tensor X ∈ RI1,I2,...,IN _{with independent tucker decomposition X =}

JG ; A

(1)_{, . . . , A}(N)

K the result rank(X ) = rank(G) has been established [25]. Letting G =JB

(1)_{, . . . , B}(N )

K and rank(X ) = R, the result can be stated in terms of norms,

(11)

X − R X r=1 a(1)_:r ◦ . . . ◦ a(N )_:r 2 F = G − R X r=1 b(1)_:r ◦ . . . ◦ b(N )_:r 2 F = 0. (13)

It is, however, more difficult to establish the relationship in (13) when using a tensor rank less than rank(X ), which is what is done in most applied work. It is also not clear in the literature how the degree of compression from the Tucker decomposition could affect the equality in (13).

4 Methodology

4.1 Higher order orthogonal iteration

The optimization problem of fitting the Tucker decomposition in (6) for the tensor X ∈ RI×J ×K can be expressed as [7, 30] min G,A,B,C kX −JG ; A, B, CKk 2 F (14) Subject to G ∈ RM ×N ×P A ∈ RI×M, B ∈ RJ ×N, C ∈ RK×P orthonormal. The optimization problem in (14) can be matricized as (8)

min G(1),A,B,C X(1)− AG(1) C ⊗ B T 2 F (15) Subject to G(1) ∈ RM ×N P A ∈ RI×M, B ∈ RJ ×N, C ∈ RK×P orthonormal. For a tensor of order N , in (15), one can simply substitute A with A(1)_{, C ⊗ B with}N

i6=nA(n)and

adjust the criterions accordingly. In Algo. 1 the higher order orthogonal iteration is outlined, which gives the core and factor matrices that minimizes (14). This is the algorithm outlined by Bader and Kolda [7] with the modification of matricizing the update step Y_(n)← A(n)_X

(1)

N

i6=nA(n)

T . By definition of the singular value decomposition, the left singular vectors are an orthonormal basis for the column space [55], ensuring the columnwise orthonormality of the factor matrices. The initialization step in algorithm 1 is either done randomly2 or using the leading singular vectors of X(n). The latter approach is referred to as higher order singular value decomposition [33].

2

(12)

Algorithm 1: HOOI

input : X ∈ RI1×I2×...×IN_{, epochs, StopCrit}

output: G ∈ RJ1×J2×...×JN_{, J}

n≤ In∀ n

A(n)∈ RIn×Jn _{orthonormal, n = 1, . . . , N}

for n ← 1 to N do

Initialize A(n)∈ RIn×Jn _{by random}

or as Jn leading singular vectors of X(n)

for e ← 1 to epochs do for n ← 1 to N do Y(n)← A(n)X(n) N i6=nA(n) T

A(n)← In leading left singular vectors of Y(n)

∆ ← X_(n)− Y_(n) 2 F if ∆ ≤ StopCrit then end

4.2 Alternating least squares

For the PARAFAC decomposition, an alternating least squares (ALS) type algorithm will be used. In the context of PARAFAC, it means that when updating one factor matrix, the others are assumed to be known [11]. For the tensor X ∈ RI×J ×K, estimating the factor matrices A, B and C is done by solving min A,B,C X − R X r=1 a:r◦ b:r◦ c:r 2 F (16) Subject to A ∈ RI×R, B ∈ RJ ×R, C ∈ RK×R.

As with the Tucker decomposition, (16) can be expressed in matricized form using (12) min A,B,C X(1)− A C B T 2 F (17) Subject to A ∈ RI×R, B ∈ RJ ×R, C ∈ RK×R. The ALS algorithm for solving (17) can be expressed as [11, 46]

A ← arg min A X(1)− A C B T 2 F B ← arg min B X(2)− B C A T 2 F C ← arg min C X(3)− C B A T 2 F,

where the update steps are repeated until some stopping criterion is met. For example, in updating A, the factor matrices B and C are assumed to be known and constant. A that minimizes the

(13)

update step is X(1) C B

+T

where + in superscript denotes the Moore-Penrose inverse [11]. Some simplifications give

A = X₍₁₎ C B+T C BT = X₍₁₎ C B C B+T = X₍₁₎ C B C B+ .

These results are generalized to the N-dimensional case by Bader and Kolda [7]. A slightly modified version is outlined in Algo. 2.

Algorithm 2: ALS

input : X ∈ RI1×I2×...×IN_{, R, epochs, StopCrit}

output: A(n)∈ RIn×R_{, n = 1, . . . , N}

for n ← 1 to N do

Initialize A(n)∈ RIn×Jn _{by random}

or as Jn leading singular vectors of X(n)

for e ← 1 to epochs do for n ← 1 to N do V ←J i6=nA(i) A(n)← X_(n)VV+ ∆ ← X_(n)− A(n)VT 2 F if ∆ ≤ StopCrit then end 4.3 Proposed method

The algorithm proposed in this paper is based on Sec. 4.1 and 4.2, and will be referred to as general two-level decomposition (GTLD). The idea is to utilize the core tensor and treat it as the uncompressed tensor when searching for a valid tensor rank for the PARAFAC decomposition. By using the core tensor, the operations required for ALS are reduced. The method is outlined in algorithm 3.

Algorithm 3: GTLD

input : X ∈ RI1×I2×...×IN_{, R, epochs, StopCrit}

output: v ∈ RR

Estimate core tensor using HOOI for r ← 1 to R do

ALS using core tensor

(14)

4.4 Evaluation metric

In deciding a valid tensor rank for the PARAFAC decomposition, the percentage of variance ex-plained from the PARAFAC estimates, referred to as relative fit (RELFIT), can be used [3, 13]. Similar to the scree plot usage for principle component analysis, variance explained is used to look for breaks as an indication of a good number of components. After this point, better model fit from increased rank is attributed to modelling noise [53]. For the tensor X ∈ RI×J ×K with the estimated tensor ˜X , RELFIT is defined as

RELFIT = 1 − PI i=1 PJ j=1 PK k=1(xijk− ˜xijk)2 PI i=1 PJ j=1 PK k=1x2ijk . (18)

With no change on the usage and interpretations, 1 − RELFIT will be used and referred to as training error. The training error of the uncompressed tensor and its core tensor will mainly be compared in terms of shape, rather than magnitude, as the usage is similar to that of scree plots.

4.5 Stability analysis

Split-half analysis is commonly adopted to assess the stability of the PARAFAC estimates for a given tensor rank. The procedure is to split the tensor along one mode, to create two tensors which should be similar. The PARAFAC decomposition is then fit to both halves, and the resulting factor matrices are compared for similarity [18, 53]. As in [18] the Tucker congruence coefficient, ϕ, which measures the correlation between corresponding columns of the factor matrices, will be used. Due to uniqueness of the PARAFAC solution, it is implied that a valid rank for the decomposition should give similar estimates in independent halves of the data. Values below 0.85 are considered problematic [18]. Estimates are only considered for modes not used for splitting the tensor [53]. The congruence coefficients will be compared between the uncompressed tensor and its independent tucker core.

5 Data

5.1 Simulation

In the data generating process the true tensor rank of the simulated tensor is known. This is done using either definition in (9).Furthermore, complexity in terms of multicollinearity, using two types of data generation schemes, will be considered. The first type follows the simulation process outlined in [29] where multicollinearity is defined in terms of the condition number3_{. Following [29],}

the factor matrices will be drawn from the uniform distribution. The condition number can simply be controlled by the lower and upper value of the uniform distribution. An interval with higher values give a higher condition number than an interval with lower values. For the dataset with a

3

(15)

lower degree of multicollinearity (lower condition number), the factor matrices will be sampled from Uniform(0.5, 1.5). The average condition number is around 110. For the dataset with more severe multicollinearity, the factor matrices will be sampled from Uniform(1, 2). The average condition number is around 162.

For the second generation scheme, the factor matrices will be generated using the multivariate normal distribution (MVND). The multicollinearity will be manipulated through the covariance matrices. First, the case of independent factor matrices will be considered where the covariance matrix will be diagonal. Then two cases of non-diagonal covariance matrices will be considered. The covariance matrix of the case with a lower degree of multicollinearity will be generated from AAT (to ensure that the matrix is positive semi-definite) where A is sampled from uniform(0,1). For the case with more severe multicollinearity, A will be sampled from uniform(1,2).

In all the simulations, the dimensions of the generated tensor will be (20 × 20 × 20) and will be assumed to have the true underlying rank of 25. The factor matrices will thus be of dimensions (20 × 25). A summary of the simulated datasets is outlined in Tab. 1.

Table 1: Simulated datasets with generation method for factor matrices.

Dataset Tensor Factor matrices 1.1 X =_JA(0)_{, A}(1)_{, A}(2) K A (n)_{∈ Uniform(0.5, 1.5), n = 1, 2, 3} 1.2 X =_JA(0)_{, A}(1)_{, A}(2) K A (n)_{∈ Uniform(1, 2), n = 1, 2, 3} 2.1 X =_JA(0)_{, A}(1)_{, A}(2) K A (n)_{∈ MVND(0, Σ), n = 1, 2, 3} Σ = diag(a), a ∈ Uniform(0, 1) 2.2 X =_JA(0)_{, A}(1)_{, A}(2) K A (n)_{∈ MVND(0, Σ), n = 1, 2, 3} Σ = AAT, A ∈ Uniform(0, 1) 2.3 X =_JA(0)_{, A}(1)_{, A}(2) K A (n)_{∈ MVND(0, Σ), n = 1, 2, 3} Σ = AAT, A ∈ Uniform(1, 2) 5.2 Real data 5.2.1 Handwritten digits

Similar to the data of [45], hand written digits will be used. Each image consists of 8 × 8 pixels with up to a 183 samples of each digit [17]. The first four digits in the data is presented in Fig.4.

(16)

Figure 4: Examples of hand written digits.

As in [45], two types of tensors will be considered. The first type is the single digit tensor, which will be referred to as dataset 3.1. For dataset 3.1 digits are grouped together into third order tensors of dimensions (pixel × pixel × observation). In this thesis, only the digit 3 will be considered. Dataset 3.1 is thus all digits labeled 3 in the dataset, of which there are 183 cases.

The next tensor considered is where all (8 × 8) digits are vectorized and constructed to be the column fibers, with the third dimension being digit label, i.e. 0, 1, 2, . . . , 9. Dataset 3.2 will thus be a third order tensor of dimensions (stacked pixels × observation × digit label). The first 150 observations of each digit will be used.

5.2.2 Image data

RGB images are inherently three dimensional with the same (pixel × pixel) matrix being replicated for each colour channel. The data used is displayed in Fig. 5. The full (150 × 122 × 3) tensor is displayed in Fig. 5a and the intensity of each colour channel is displayed in Fig. 5b. The red background shows clearly in the red channel in Fig. 5b as being brighter (more intense) than its green and blue counterparts.

(17)

(a) Full RGB image. (b) Intensity of red, blue and green. Figure 5: RGB image data with both the full tensor displayed and each colour channel.

The image data will be referred to as dataset 4.1.

6 Results

In this Chapter, the viability of using the core tensor, in place of the uncompressed tensor, will be evaluated. For all datasets, the training error presented in Section 4.4 will be evaluated for both the orginal tensor and its core tensor for different levels of compression. Evaluation will be done graphically to assess whether the training error of the core tensor is similar to that of the uncompressed tensor. Furthermore the viability of algorithm 4.3 will be evaluated in terms of time consumption in relation to time consumption for ALS on the original tensor. Generally, all modes will be compressed to an equal degree.

For the simulated datasets in Tab. 1, 100 datasets of each will be simulated to evaluate the stability of the process. For each dataset in Tab. 1, the average difference in training errors of the uncompressed tensor and its core will be assessed.

(18)

6.1 Simulations

In Fig. 6 the results for dataset 1.1 are presented for different compression levels.

(a) 40% compression in all modes. (b) 30% compression in all modes.

(c) 20% compression in all modes. (d) 10% compression in all modes.

Figure 6: Comparison of the training errors from the PARAFAC decomposition on the original data and its core for dataset 1.1. Blue dashed and red solid lines are reproducible. Shaded lines are estimations from 99 datasets with the same simulation scheme.

For compression levels 40%, Fig. 6a, and 30%, Fig. 6b the training errors are not similar in terms of shape. For compression levels 20%, Fig. 6c, and 10%, Fig. 6d, the training error of the tensors and their cores are close up to tensor rank 11. Time for estimating the training error over tensor ranks 1 to 30 of the uncompressed data and its core tensor is presented in Tab. 2. Estimation is faster for GTLD for all compression rates, and the estimation process is faster with a higher degree of compression.

(19)

Here it would seem that any compression level is valid, the training error of the core tensors maintains the shape of the training error of the uncompressed tensors. A compression level of 40%, as seen in Fig. 7a, leads to an average reduction in time consumtion of nearly 50% based on Tab. 2. With a compression level of 10%, Fig. 7d, the training errors are nearly identical up to tensor rank 11. Time consumption using a 10% compression rate is less than using 40%.

The average differences of the training error for datasets 1.1 and 1.2 are presented in Fig. 8. For dataset 1.1, Fig. 8a, the average difference in training error is within an interval of −0.002 and 0.001. The same result holds for dataset 1.2, as shown in Fig. 8b. For both datasets, there is a large dip at tensor rank 7. For dataset 1.2, the avarage difference in training error stabilizes around 0 toward the true tensor rank, for dataset 1.1 it diverges negatively above tensor rank 20 and shows no sign of stabilization.

(20)

(a) Dataset 1.1 (b) Dataset 1.2

Figure 8: Difference in training error of the uncompressed data and its core tensor for datasets 1.1 and 1.2. The red line is the average of the 100 simulated datasets.

The results for dataset 2.1 are presented in Fig. 9. For compression rates 20%, Fig. 9c, and 10%, Fig. 9d, the training errors are close to a tensor rank of 8. Surprisingly, the time consumption of GTLD is negatively related to compression as seen in Tab. 2, using 40% compression in all modes consumes more time than simply using the uncompressed tensor.

(21)

The results for dataset 2.2 are presented in Fig. 10. For all levels of compression, the training errors becomes more stable towards the true tensor rank. However, the training errors show oscil-lation close to the true tensor rank. For compression levels 40%, Fig. 10a, and 30%, Fig. 10b, the training error of the core tensor only provides a valid estimate up to tensor rank 8. For compression levels 20%, figure 10c, and 10%, Fig. 10d, the training error of the core tensors are close to the training error of the uncompressed tensors. For both 20% and 10% compression, based on Tab. 2, the time consumption of GTLD is around 70% of fitting the PARAFAC decomposition on the uncompressed tensor. It should also be noted that time consumption for dataset 2.2 is much larger in comparison with that of datasets 2.1 and 2.3.

(22)

The results for dataset 2.3 are presented in Fig. 11, and are similar to the results for dataset 2.2. One difference is that the solutions are more stable around the true tensor rank. It should also be noted that all the training errors in Fig. 11 are of lower magnitude in comparison with the training errors of Fig. 10. Based on Tab. 2, a compression rate of 10% leads to a 42% decrease in time consumption. It should be noted that time consumption using 40% compression rates is larger than using 30%.

(23)

The average differences of the training error for datasets 2.1, 2.2 and 2.3 are presented in Fig. 12. In comparison with dataset 1.1 and 1.2, the average differences in Fig. 12 follow the horizontal 0 line more closely, and the difference is more stable with lower tensor rank. However, the difference in training error becomes more unstable with increasing tensor rank. For dataset 2.2, Fig. 12b, the oscillation of the training errors, as shown in Fig. 10, near the true tensor rank shows clearly. Results for dataset 2.1, Fig. 12a, and dataset 2.3, Fig. 12b, are similar in terms of variation over samples and slight positive deviation of the avarage difference from the horizontal 0 line. For all datasets in Fig. 12, the results are stable up to a tensor rank of 10.

(24)

(a) Dataset 2.1. (b) Dataset 2.2.

(c) Dataset 2.3.

Figure 12: Differences in the training error of the uncompressed data and its core tensor for datasets 2.1, 2.2 and 2.3. The red line is the average of the 100 simulated datasets.

6.2 Real data

As the training errors was closer with less compression for all simulated datasets, only 10% and 20% compression will be considered for the real data. In Fig. 13 the results from dataset 3.1 are presented. In Fig. 13a, the training error of the tensors show a similar curvature as that of their cores for compression rates 10% and 20%. The slope of the training error gets steeper with increasing compression rate. The differences in the training error are quite stable, as shown in Fig. 13b, apart from a sharp drop where the training error of the core tensors hit 0. The estimation time for the uncompressed tensor is 25.09 seconds. For the core tensors the estimation time with 10% compression is 11.95 seconds and 7.77 seconds with 20%.

(25)

Table 2: Time to estimate PARAFAC for the original tensor and the core tensor for ranks 1 to

30 in seconds, averaged over 100 runs.

Data Dimensions G Time, X Time, G 1.1 (12 × 12 × 12) 4.159 2.293 (14 × 14 × 14) 2.881 (16 × 16 × 16) 3.376 (18 × 18 × 18) 3.540 1.2 (12 × 12 × 12) 2.377 1.361 (14 × 14 × 14) 1.666 (16 × 16 × 16) 0.925 (18 × 18 × 18) 1.073 2.1 (12 × 12 × 12) 3.117 4.944 (14 × 14 × 14) 2.521 (16 × 16 × 16) 1.881 (18 × 18 × 18) 1.674 2.2 (12 × 12 × 12) 13.681 8.033 (14 × 14 × 14) 10.649 (16 × 16 × 16) 9.524 (18 × 18 × 18) 9.637 2.3 (12 × 12 × 12) 3.468 1.543 (14 × 14 × 14) 1.813 (16 × 16 × 16) 2.024 (18 × 18 × 18) 2.008

(a) Comparison of training errors. (b) Differences in training error. Figure 13: Results for dataset 3.1 with compression in all modes.

The results for dataset 3.2 are presented in Fig. 14. The training error of the core tensors differs mainly in magnitude for both compression rates 20% and 10%. The training errors are very similar

(26)

tensor is 73.05 seconds. For the core tensors, estimation time with 10% and 20% compression is 72.36 and 69.07 respectively. There are thus negligible gains from using the core tensor rather than the uncompressed tensor in this case.

(a) Comparison of training errors. (b) Differences in training error. Figure 14: Results for dataset 3.2 with compression in all modes.

The results for dataset 4.1 are presented in Fig. 15. The training errors are nearly identical up to a tensor rank of 7 as shown in both Fig. 15a and Fig. 15b. For both compression rate 10% and 20%, in the pixel modes, the training error of the core tensor shoots up slightly. The difference in training error stabilizes around 0 at tensor rank 20 and above, as shown in Fig. 15b. The estimation time for the uncompressed tensor is 31.7 seconds. For the core tensor with 10% compression, the estimation time is 28.32 seconds. With 20% compression, the estimation time for the core tensor is 24.63 seconds.

(a) Comparison of training error. (b) Difference in training error. Figure 15: Results for dataset 4.1 with compression in all modes but the colour channels.

6.3 Stability analysis

Results from the stability analysis for datasets 1.1 and 1.2 are presented in Tab. 3. Splits are considered for the third mode. The congruence estimates for the third mode are subsequently left

(27)

out of the analysis. Estimates are from 100 splits using random sampling without replacement of indices. Each resulting half-tensor is of dimension 20 × 20 × 10 for the uncompressed data and 18 × 18 × 10 for the core tensor. Most notable is that the congruence estimate of the core tensors generally gives values below 0.85, i.e. the estimates from the halves are generally not similar. The estimates obtained from the core tensor are not stable for datasets 1.1 and 1.2. Using the uncompressed tensor, stable solutions for dataset 1.1 are obtained with a tensor rank of up to 2, and for dataset 1.2 one could also consider rank 3. The results for the core tensor in the other simulated datasets are similar.

Table 3: Congruance estimates from split-half analysis. The core tensor is of dimension (18 × 18 × 20). Splits are performed on the second mode. Estimates are from 100

permutations. Standard error in parentheses.

X G Dataset rank ϕA ϕB ϕA ϕB 1.1 1 0.992 0.993 0.088 0.996 (0.000) (0.000) (0.014) (0.000) 2 0.915 0.953 0.460 0.519 (0.016) (0.002) (0.093) (0.001) 3 0.130 0.374 0.374 0.005 (0.061) (0.026) (0.005) (0.099) 1.2 1 0.997 0.997 -0.036 0.998 (0.000) (0.000) (0.011) (0.000) 2 0.967 0.982 0.529 0.516 (0.001) (0.001) (0.107) (0.001) 3 0.814 0.890 0.312 0.029 (0.007) (0.001) (0.003) (0.098) 4 -0.018 0.128 0.497 0.012 (0.046) (0.036) (0.001) (0.099)

The split-half analysis for dataset 3.2 is presented in table 4. The procedure of estimating the congruence measurements are done using the same procedure as for datasets 1.1 and 1.2. The mode considered for splits is the sample mode, which is left uncompressed at 150. The core tensor used is compressed 10% in the digit and pixel modes with a resulting tensor of dimension 9 × 58 × 150. As with datasets 1.1 and 1.2, the congruence estimates for the core tensor show that the PARAFAC estimates are not stable between halves. For the uncompressed data, ranks from 1

(28)

estimate for the first factor matrix is increasing with rank for the uncompressed data. The results hold for the other real datasets.

Table 4: Congruance estimates from split-half analysis for dataset 3.2. The core tensor is of dimension (9 × 58 × 150).

Splits are performed on the observations. Estimates are from 100 permutations. Standard error in parentheses.

X G Dataset rank ϕA ϕB ϕA ϕB 3.2 1 0.958 0.999 -0.262 0.133 (0.000) (0.000) (0.003) (0.000) 2 0.992 0.995 0.134 0.023 (0.000) (0.000) (0.001) (0.000) 3 0.325 0.107 -0.044 0.042 (0.002) (0.001) (0.001) (0.000) 4 0.845 0.893 0.007 0.289 (0.007) (0.000) (0.001) (0.001) 5 0.949 0.937 -0.229 0.251 (0.007) (0.000) (0.001) (0.001) 6 0.304 0.073 -0.034 -0.020 (0.002) (0.001) (0.000) (0.001)

7 Conclusions

In this paper, the validity of using the core tensor from the Tucker decomposition, in place of the uncompressed tensor, when finding a valid tensor rank for the PARAFAC decomposition has been reviewed. Results from simulated and real data show that the core tensor could be a suitable replacement of the original tensor. For lower tensor ranks, the training error of the core tensor is generally close to the that of the uncompressed tensor. There is however some ambiguity concerning the relationship between compression and time consumption; more compression does not necessarily imply less time consumed in estimation. Based strictly on the training error and time consumption, 10% compression rate tend to produce the most valid results.

With 10% compression, for the simulated datasets, there was some problematic results. For datasets 1.1 and 1.2, with factor matrices generated from the uniform distribution, there was a dip in the average difference of training errors at tensor rank 8. Such a dip could potentially lead to an invalid decision on tensor rank. For dataset 2.2, there was a problem of oscillation of the training errors around the true tensor rank. This pattern is not only observed for the core tensors, so it could possibly be attributed to the high tensor rank, rather than a problem with the proposed

(29)

method. On the positive side, for datasets with factor matricies generated from the MVND, the avarage difference of training errors follows a straight line up to tensor rank 12.

The stability of the difference of training errors is different for the simulation schemes. With factor matrices generated from the uniform distribution, the difference is more stable over samples than the difference with factor matricies generated from the MVND. The pattern of stability also proved different. For example, the difference in training error shows increased stability towards the true tensor rank for dataset 1.2, whilst the opposite was observed for dataset 2.2.

For the two tensors with handwritten digits, the results looked promising for lower tensor ranks. For dataset 3.2, however, the time saved in using the core tensor was negligible. For the image data, the result was especially promising with a compression rate of 10%. Up to tensor rank 6, the difference in training errors is very close to the 0 line with little oscillation.

One major drawback for the proposed method is that the estimates proved dissimilar between the halves of the core tensor in the split-half analysis. As stability analysis is a major diagnostics tool for the PARAFAC decomposition, this problem should be addressed. Otherwise the training error from the core tensor needs to be combined with stability analysis on the uncompressed tensor. It seems that the proposed method, GTLD, gives different results for different tensors. Any general validity of the proposed method can therefore not be claimed. Validity would have to be investigated in more specific contexts, such as fluorescence and signal data, with more contextual simulation schemes. It would also be advisable to include the method proposed in a comparison of different algorithms for finding a good tensor rank for the PARAFAC decomposition.

(30)

8 References

[1] M. Alex, O. Vasilescu, and D. Terzopoulos. Multilinear analysis of image ensembles: Tensor-faces. Computer Vision - ECCV 2002, PT 1, 2350:447–460, 2002.

[2] A.H. Andersen and W.S. Rayens. Structure-seekin multilinear methods for the analysis of fmri data. Neuroimage, 22:728–739, 2004.

[3] C.M. Andersen and R. Bro. Practical aspects of parafac modeling of fluorescence excitation-emission data. Journal Of Chemometrics, 17:200–215, 2003.

[4] C.A. Andersson and R. Bro. Improving the speed of multi-way algorithms: Part i. tucker3. Chemometrics and Intelligent Laboratory Systems, 42:93–103, 1998.

[5] C.A Andersson and R. Henrion. A general algorithm for obtaining simple structure of core arrays in n-way pca with application to fluorometric data. Computational Statistics & Data Analysis, 31:255–278, 1999.

[6] B.W. Bader and T.G. Kolda. Matlab tensor classes for fast algorithm prototyping. Transac-tions on Mathematical Software, 32:635–653, 2006.

[7] B.W. Bader and T.G. Kolda. Tensor decompositions and applications. Siam Review, 51:455– 500, 2009.

[8] D. Baunsgaard, L. Munck, and L. Nørgaard. Analysis of the effect of crystal size and color distribution on fluorescence measurements of solid sugar using chemometrics. Applied Spec-troscopy, 54:1684–1689, 2000.

[9] C.F. Beckmann and S.M. Smith. Tensorial extensions of independent component analysis for multisubject fmri analysis. Neuroimage, 6:294–311, 2005.

[10] C. Bocci, E. Carlin, and J. Kileel. Hadamard products of linear spaces. Journal of Algebra, 448:595–617, 2016.

[11] R. Bro. Parafac. tutorial and applications. Chemometrics and Intelligent Laboratory Systems, 38:149–171, 1997.

[12] R. Bro. Multi-way Analysis in the Food Industry: Models, Algorithms and Applications. PhD thesis, University of Amsterdam, 1998.

[13] R. Bro and H.A.L. Kiers. A new efficient method for determining the number of components in parafac models. Journal of Chemometrics, 17:274–286, 2003.

[14] X. Cao, X. Wei, Y. Han, and D. Lin. Robust face clustering via tensor decomposition. IEEE Transactions on Cybernetics, 45:2546–2557, 2015.

(31)

[15] J.D. Carroll and J.J. Chang. Analysis of individual differences in multidimensional scaling via an n-way generalization of ”eckart-young” decomposition. Psychometrika, 35:283–319, 1970. [16] J.D. Carroll, S. Pruzansky, and J.B. Kruskal. Candelinc: a general approach to

multidimen-sional analysis of many-way arrays with linear constraints on parameters. Psychometrika, 45:3–24, 1980.

[17] Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository, 2017. http:// archive.ics.uci.edu/ml.

[18] P. Giordani and H.A.L. Kiers. A review of tensor-based methods and their application to hospital care data. Statistics in Medicine, 37:137–156, 2018.

[19] D. Goldfarb and Z. Qin. Robust low-rank tensor recovery: Models and algorithms. SIAM Journal on Matrix Analysis and Applications, 35:225–253, 2014.

[20] W. Guo, I. Kotsia, and I. Patras. Tensor learning for regression. IEEE Trans. Image Process, 21:816–827, 2012.

[21] W. Hackbusch. Tensor Spaces and Numerical Tensor Calculus. Springer series in computa-tional mathematics, 2012.

[22] R.A. Harshman. Foundations of the parafac procedure: Models and conditions for an ”ex-planatory” multi-model factor analysis. UCLA Working Papers in Phonetics, 16:1–84, 1970. [23] F.L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of

Mathematical Physics, 6:164–189, 1927.

[24] P.D. Hoff. Equivariant and scale-free tucker decompositio models. Bayesian Analysis, 11:627– 648, 2016.

[25] B. Jiang, F. Yang, and S. Zhang. Tensor and its tucker core: the invariance relationships. Numerical Linear Algebra with Applications, 24:nla.2086, 2017.

[26] E. Karahan, P.A Rojas-López, M.L. Bringas-Vega, P.A. Valdés-Hernández, and P.A Valdes-Sosa. Tensor analysis and fusion of multimodal brain images. IEEE, 103:1531–1559, 2015. [27] B.N. Khoromskij. Structured rank-(r1, . . . , rd) decomposition of function-related tensors in rd.

Computational Methods in Applied Mathematics, 6:194–220, 2006.

[28] H.A.L. Kiers. Tuckals core rotations and constrained tuckals modelling. Statistica Applicata, 4:659–667, 1992.

(32)

[30] T.G. Kolda. Multilinear Operators for Higher-Order Decompositions. Tech. Report SAND2006-2081, Sandia National Laboratories, Albuquerque, NM, Livermore, CA, 2006.

[31] J.B. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with appli-cations to arithmetic complexity and statistics. Linear Algebra Appl., 18:95–138, 1977. [32] L. De Lathauwer, B. De Moor, and J. Vandewalle. An introduction to independent component

analysis. Journal of Chemometrics, 14:123–149, 2000.

[33] L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21:1253–1278, 2000.

[34] X. Li, M.K. Ng, G. Cong, and Q. Wu Y. Ye. Mr-ntd: Manifold regularization nonnegative tucker decomposition for tensor data dimension reduction and representation. IEEE Transac-tions on Neural Networks and Learning Systems, 28:1787–1800, 2017.

[35] Y. Li and A. Ngom. Classification of clinical gene-sample-time microarray expression data via tensor decomposition methods. Computational Intelligene Methods for Bioinformatics and Biostatistics, 6685:275–286, 2011.

[36] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion for estimating missing values in visual data. IEE Transactions on Pattern Analysis and Machine Intelligence, 35:208–220, 2013.

[37] S. Marco. Learning with tensors: a framework based on convex optimization and spectral regularization. Machine learning, 94:303–351, 2014.

[38] L. Moberg, G. Robertsson, and B. Karlberg. Spectrofluorimetric determination of chlorophylls and pheopigments using parallel factor analysis. Talanta, 54:161–170, 2001.

[39] A. Nanopoulos, D. Rafailidis, P. Symeonidis, and Y. Manolopoulos. Musicbox: Personalized music recommendation based on cubic analysis of social tags. IEEE Transactions on Audio, Speech and Language Processing, 18:407–412, 2010.

[40] D. Nion, K. N. Mokios, N. D. Sidiropoulos, and A. Potamianos. Batch and adaptive parafac-based blind seperation of convolutive speech mixtures. IEEE Transactions on Audio, Speech, and Language Processing, 18:1193–1207, 2010.

[41] L. Omberg, G.H. Golub, and O. Alter. A tensor higher-order singular value decomposition for integrative analysis of dna microarray data from different studies. Proceedings of the National Academy of Sciences of the United States of America, 104:18371–18376, 2007.

[42] X.D Qing, Y. Li, J. Wen, X.Z Shen, C.Y Li, X.L liu, and J.xie. A new method to determine the number of chemical components of four-way data from mixtures. Microchemical Journal, 135:114–121, 2017.

(33)

[43] C.R. Rao and S. Mitra. Generalized Inverse of Matricies and its Applications. Wiley, New York, 1971.

[44] M. Mørup. Applications of tensor (multiway array) factorizations and decompositions in data mining. Wiley interdisciplinary reviews. Data mining and knowledge discovery, 1:24–40, 2011. [45] B. Savas and L. Eld´en. Handwritten digit classification using higher order singular value

decomposition. Pattern Recognition, 40:993–1003, 2007.

[46] N.D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E.E. Papalexakis, and C. Faloutsos. Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65:3551–3582, 2017.

[47] A. Stegeman, J.M.F Ten Berge, and L. De Lathauwer. Sufficient conditions for uniqueness in candecomp/parafac and indscal with random component matricies. Psycometrika, 71:219–229, 2006.

[48] A. Stegeman and N.D. Sidiropoulos. On kruskal’s uniqueness condition for the cande-comp/parafac decomposition. Linear Algebra and its Applications, 420:540–552, 2007.

[49] P. Symeonidis. Clusthosvd: Item recommendation by combining semantically enhanced tag clustering with tensor hosvd. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46:1240–1251, 2016.

[50] M.J. Tobia, K. Hayashi, G. Ballard, I.H. Gotlib, and C.E. Waugh. Dynamic functional con-nectivity and individual differences in emotions during social stress. Human Brain Mapping, 38:6185–6205, 2017.

[51] L.R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31:279– 311, 1966.

[52] H. Wang and N. Ahuja. A tensor approximation approach to dimensionality reduction. Inter-national Journal of Computer Vision, 76:217–229, 2008.

[53] D. Wayne and A.R. Harshman. An Application of PARAFAC to a Small Sample Prob-lem, Demonstrating Preprocessing, Orthogonality Constraints, and Split-Half Diagnostic Tech-niques. Praeger, 1984.

[54] D. Xu, S. Yan, L. Zhang, S. Lin, H. Zhang, and T.S. Huang. Reconstruction and recognition of tensor-based objects with concurrent subspaces analysis. IEEE Transactions on Circuits and Systems for Video Technology, 18:36–47, 2008.

(34)

[56] X. Zhang, G. Wen, and W. Dai. A tensor decomposition-based anomaly detection algorithm for hyperspectral image. IEEE Transactions on Geoscience and Remote Sensing, 54:5801–5820, 2016.

Speeding up PARAFAC: Approximation of tensor rank using the Tucker core