Convolutive Features for Storage and Transmission

(1)

Convolutive Features for Storage and Transmission

Ruair´ı de Fr´ein hrdefrein@gmail.comi. KTH - Royal Institute of Technology, Sweden & Telecommunications Software and Systems Group, Ireland

Abstract

A central concern for many learning algorithms and sensing

systems is how to efficiently store what the algorithm/system has learned. Convolutive Non-negative Matrix Factorization (CNMF) finds parts-based convolutive representations of non-negative data. Convolutive extensions of NMF have not considered storage efficiency as a side constraint during the learning procedure. We contribute an algorithm, Storable NMF (SNMF), that fuses ideas from the (1) parts-based learning and (2) integer sequence

compression literature. SNMF enjoys the merits of both

techniques: it retains the good-approximation properties of CNMF while also taking into account the size of the symbol set which is used to express the learned convolutive factors and

activations. We demonstrate that SNMF yields a compression

ratio ranging from 10:1 up to 20:1, which gives rise to a similar bandwidth and storage saving for networked sensors.

Trick: SNMF achieves these improved compression ratios,

without incurring a significant loss of accuracy, by embedding an off-the-shelf compression algorithm in the CNMF updates so that quantization function updates are interleaved with CNMF’s update rules.

Introduction

• Non-negative Matrix Factorization (NMF): bi-linear decomposition of multivariate data [1, 2].

• NMF factorizes V ∈ _R₊^M^×^N into the product of two matrices W ∈ _R₊^M^×^R (basis) and H ∈ _R^R₊^×^N (activations), yielding the estimate ˜V = _{W H. R} < _{M, N.}

• NMF is a “parts-based” approach [3, 4, 5].

• In the case of audio, NMF is applied to magnitude spectra [6].

Reconstruction objective:

D(_V||_V^˜ ) =

∑

m,n

V[_{m, n}] _log ^V[_{m, n}]

V˜ [_{m, n}] − _V[_{m, n}] + _V^˜ [_{m, n}]_. ₍₁₎

Motivation

Learn acoustic features [7] at the edge of a network that can be transferred between (and stored on) different

sensors/networked devices.

We exploit Occam’s razor by:

• Learning low-rank factorization;

• Learning parsimonious activation functions by using a convolutive generative model;

• Expressing the factorization using a symbol set with a small cardinality.

Convolutive NMF

• Extends NMF to time-varying objects [7] (capture features with temporal extent);

V ≈ V^˜ =

T−₁ t

∑

=₀

W_t ^tH .^→ (2)

• Activates a cascade of T basis functions;

H = _H ⊗ ^W

Tt

←_t

hV V˜

i

W_t^T1_M_×_N, W_t = _W_t ⊗

V˜ V

t→T

H

1_M_×_N ^t^→H^T

(3)

• Learns T + 1 matrices W₀, W₁ . . . W_T₋₁, and H, where H is the average update.

Evolving Spectrogram: Signal composed of auditory objects that evolve with time; Basis functions (that evolve with time); Acti- vations of these functions [8].

This work was supported by an ELEVATE Irish Research Council International Career Devel- opment Fellowship co-funded by Marie Curie Actions award: ELEVATEPD/2014/62.

Storable NMF

• Taking a NLA at convergence to improve compressibility of

factors, introduces error indiscriminately (b·c denotes rounding and ∆^? is the quantization step-size);

Hˆ ← ¹

∆^?H + ¹ 2

, Wˆ _t ←

W_t 1

∆^? + ¹ 2

, (4)

where ˆW_t[_{m, r}]_{, ˆ}_H[_{r, n}] ∈ _I.

I =







[

m,r,t

Wˆ _t[_{m, r}]







[

(

[

r,n

Hˆ [_{r, n}] )

. (5)

• Is there a better way to pick the set?

Adapt the step-size for each row of H [9, 10]

Hˆ [_{1, n}] H[_{1, n}]

∆_h[₁]

Hˆ [_{2, n}] H[_{2, n}]

∆_h[₂]

Adaptive Quantization

• Adapt the quantization function’s step-size for each column of W_t and each row H.

Wˆ _t[_{:, r}] ← ^W^t[_{:, r}]

∆w[_r] + ¹ 2

, ∀_{r, t} _H^ˆ [_{r, :}] ← ^H[_{r, :}]

∆_h[_r] + ¹ 2

, ∀_{r. (6)}

• Initially I is a subset of I ⊂ _R₊; At convergence, a subset of the positive integers.

• The cardinality of I, is generally minimized as I is indirectly learned by the factorization.

Definition 1 SNMF factorizes the matrix V ∈ _R₊^M^×^N into the product of two element-wise integer matrices ˆW_t ∈ _I^M^×^R _{and ˆ}_H ∈ _I^R^×^N _and two diagonal matrices diag(_∆_w) _{and diag}(_∆_h)_,

V˜ =

T−₁ t

∑

=₀

Wˆ _tdiag(_∆_w)_diag(_∆_h)

t→

Hˆ , where ∆w, ∆_h ∈ _R^R₊^×¹_. ₍₇₎

• Quantization step-sizes, ∆w[_r]_, _∆_h[_r] ∈ _R₊, though real-valued, are few, and are adapted to reduce the cardinality of I. The majority of elements of the factors are integer, W_t[_{m, r}]_, ^t_H^→ [_{r, n}] ∈ _I.

Quantization Step-size Updates: The quantization step-sizes

∆_h[_r] _and _∆_w[_r], in the convolutional mixing model in (Eqn. 7) produce NN updates if they are updated using

∆h[r] ← _∆_h[r] 1^T_M

"

V Vˆ ⊗

"

∑^Tt=⁻0¹Wˆ _t[_{:, r}]_∆_w[r]

t→

Hˆ [_{r, :}]

##

1_N

1^T_M

"

∑^Tt=⁻0¹Wˆ _t[_{:, r}]_∆_w[r]

t→

Hˆ [_{r, :}]

# 1_N

, (8)

∆w[r] ← _∆_w[r] 1^T_M

"

V Vˆ ⊗

"

∑^Tt=⁻0¹Wˆ _t[_{:, r}]_∆_h[r]

t→

Hˆ [_{r, :}]

##

1_N

1^T_M

"

∑^Tt=⁻0¹Wˆ _t[_{:, r}]_∆_h[r]

t→

Hˆ [_{r, :}]

# 1_N

. (9)

NLA b·c & Monotonic Convergence

• Quantization of elements of ˆW_t may cause the KLD to increase;

• Generally, ∆_W[_r] _W^ˆ _t[_{:, r}] (for the important (big) entries);

• We have freedom in how we apply element-wise rounding; we desire monotonic convergence. Heuristic: we randomly

substitute in a number of new quantization functions until KLD is minimized.

Choose between

 W_t[_{m, r}]

∆_W[_r] + ¹ 2

m,r,t

, and  W_t[_{m, r}]

∆w[_r] − ¹ 2

m,r,t

(10) with probability p.

SNMF-parsing-encoding scheme

• Concatenate the columns of ˆW and take transpose to generate a sequence

W_t = [_W^ˆ _t[_{:, 1}]^T|_W^ˆ _t[_{:, 2}]^T| _{. . .} |_W^ˆ _t[_{:, R}]^T]_. ₍₁₁₎

• Concatenate the rows of ˆH to generate,

H = [_H^ˆ [_{1, :}]|_H^ˆ [_{2, :}]| _{. . .} |_H^ˆ [_{R, :}]]_. ₍₁₂₎

• Generate the sequence of integers,

S = _M|_R|_N|_W|_H|_∆_w|_∆_h_. ₍₁₃₎

• Works well because gzip adapts to changes in the integer

sequence. Different columns and rows of ˆW & ˆH have different stats.

1^∗ 1 3 1 1 3 1 2 3 1 2 1 1 1 3 7→ ^(·^{, 0, 1}⁾

Look ahead

1 1 3^∗ 1 1 3 1 2 3 1 2 1 1 1 3 7→ ⁽^{1, 1, 3}⁾

Look ahead

1 1 3 1 1 3 1 2^∗ 3 1 2 1 1 1 3 7→ ⁽^{3, 4, 2}⁾

Look ahead Dictionary

1 1 3 1 1 3 1 2 3 1 2 1^∗ 1 1 3 7→ ⁽^{3, 3, 1}⁾

1 1 3 1 1 3 1 2 3 1 2 1 1 1 3^∗ 7→ ⁽^{1, 2, 3}⁾

2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this materialc for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. For more details: Ruair´ı de Fr´ein, ”Learning Convolutive Features for Storage and Transmission between Networked Sensors”, 2015 International Joint Conference on Neural Networks, to appear.

Results

Hypothesis: Given that (C)NMF is non-unique and non-exact, but typically yields a satisfactory decomposition, we have the

freedom to choose an arbitrary satisfactory lossy factorization with respect to the distortion constraint of choice. Can SNMF learn a factorization which is: (1) within 1-3dB of the expected CNMF Signal-to-Noise-Ratio and (2) amenable to efficient coding?

SNMF Applied to Synthetic Data-set:

We report average performance over 100 trials. Each trial was run for 100 iterations. R = 2. FFT = 1024. Overlap = 512. SNR is improved by increasing R.

• Noise-Burst Data-set: Performance comparison NMF vs SNMF (T = _1).

Algorithm NMF SNMF (T = ₁₎

No. unique elements of W/ ˆW 1026 138

No. unique elements of H/ ˆH 874 19

File-size (gzip / not gzipped) in bytes 17704/35839 1659/35890

Compression ratio 10:1

SNR (dB) 6.58dB 6.6dB

• Evolving Spectrogram: Performance comparison CNMF vs SNMF (T = _32).

Algorithm CNMF SNMF (T = ₃₂₎

No. unique elements of W/ ˆW 32832 155 No. unique elements of H/ ˆH 436 21

File-size (gzip/ not gzipped) 300239/625609 13841/674142

Compression ratio 20:1

SNR (dB) 12.5dB 11.56dB

Noise-Burst Data: Histograms of the elements of SNMF’s (T = ₁₎ Wˆ vs NMF’s W.

SNMF (above); NMF (below).

Noise-Burst Data.

• The number of bins of the histograms associated with SNMF (T = 1) is smaller than for NMF.

• NMF’s basis function elements only have one element per bin.

SNMF Applied to TIMIT Database:

CNMF SNMF CNMF SNMF

Algorithm (T = _{1) (T} = _{1) (T} = ₅₎ _(T = ₅₎ No. unique elements W/ ˆW 10260 1133 51299 374 No. unique elements H/ ˆH 1820 518 1818 54 File-size gzip bytes 112278 11442 489705 46389 File-size not gzipped bytes 232444 226119 1016321 1035544

Compression ratio ≈_9.8:1 ≈_10.6:1

SNR (dB) 11.61 11.19 25.4 24.14

Algorithm (T = _{3) (T} = _{3) (T} = ₇₎ _(T = ₇₎ No. unique elements W/ ˆW 30780 341 71820 421 No. unique elements H/ ˆH 1820 86 1801 47 File-size gzip bytes 305111 30243 682557 63990 File-size not gzipped bytes 632236 633672 1417852 1438136

Compression ratio ≈_10.1:1 ≈_10.67:1

SNR (dB) 15.63 15.83 29.28 26.5

• Effect of T on CNMF versus SNMF (1 ≤ _T ≤ _7);

• Each trial run for 500 iterations; R = 20 is fixed;

• Set I of SNMF is smaller than CNMF;

• Compressibility of SNMF is greater than CNMF;

• Compressibility is not affected by an increase in T;

• Error introduced by SNMF is reasonable relative to CNMF.

Conclusions

• Element-wise integer constraints are introduced and

compression is improved. SNMF combined with a universal loss-less coding scheme is portable and efficiently stored.

• Error introduced by integerizing the factors is folded into the CNMF updates. A 10:1 or 20:1 compression ratio of the results of a (C)NMF is possible if the factors are restricted to a finite integer symbol set. This only incurs a distortion cost of 1-3dB.

References

[1] D.D. Lee and H.S. Seung, “Algorithms for Nonnegative Matrix Factorization,” NIPS, vol.

13, pp. 556–62, 2001.

[2] R. de Fr´ein, K. Drakakis, and S. Rickard, “Portfolio diversification using subspace factori- sations,” in IEEE Inf. Sc. Sys. 42nd Conf., 2008, pp. 1075–80.

[3] D. Donoho and V. Stodden, “When does Nonnegative Matrix Factorization give a correct decomposition into parts?,” NIPS, vol. 16, 2004.

[4] Ruair´ı de Fr´ein, “Formal concept analysis via atomic priming,” in Formal Concept Analysis, Peggy Cellier, Felix Distel, and Bernhard Ganter, Eds., vol. 7880 of Lecture Notes in Computer Science, pp. 92–108. Springer Berlin Heidelberg, 2013.

[5] Ruair´ı de Fr´ein, “Ghostbusters: A parts-based NMF algorithm,” in Signals and Systems Conference (ISSC 2013), 24th IET Irish, Jun. 2013, pp. 1–8.

[6] R.K. Potter, G.A. Kopp, and H.C. Green, “Visible speech,” 1947.

[7] P. Smaragdis, “Nonnegative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs,” Int. Conf. Independent Component Analysis and Blind Sig- nal Separation, pp. 494–9, 2004.

[8] P. D. O’Grady and B. A. Pearlmutter, “Discovering speech phones using convolutive nonnegative matrix factorisation with a sparseness constraint,” in Neurocomp., 2008, pp. 88–

101.

[9] R. de Fr´ein, “Learning and Storing the Parts of Objects: IMF,” in IEEE Int. Workshop on Machine Learning for Sig. Proc., Sep. 2014, pp. 1–6.

[10] R. de Frein, “Quantized nonnegative matrix factorization,” in Digital Signal Processing (DSP), 2014 19th International Conference on, Aug 2014, pp. 377–382.

1