Training Algorithm for Extra Reduced Size Lattice-Ladder Multilayer Perceptrons

(1)

Training Algorithm for Extra Reduced Size

Lattice–Ladder Multilayer Perceptrons

Dalius Navakauskas

Division of Automatic Control

Department of Electrical Engineering

Link¨

opings universitet, SE-581 83 Link¨

oping, Sweden

WWW: http://www.control.isy.liu.se

E-mail: dalius@isy.liu.se

March 5, 2003

AUTOMATIC CONTROL

COM

MUNICATION SYSTEMS LINKÖPING

Report no.: LiTH-ISY-R-2499

Submitted to journal Informatica ISSN-0868-4952

Technical reports from the Control & Communication group in Link¨oping are available at http://www.control.isy.liu.se/publications.

(2)

(3)

Training Algorithm for Extra Reduced Size

Lattice–Ladder Multilayer Perceptrons

Dalius Navakauskas

∗

Radioelectronics Department, Electronics Faculty

Vilnius Gediminas Technical University

Naugarduko 41, 2600 Vilnius, Lithuania

E-mail: dalius@el.vtu.lt

March 5, 2003

Abstract

A quick gradient training algorithm for a specific neural network structure called an extra reduced size lattice–ladder multilayer perceptron is introduced. Pre-sented derivation of the algorithm utilizes recently found by author simplest way of exact computation of gradients for rotation parameters of lattice–ladder filter. Developed neural network training algorithm is optimal in terms of min-imal number of constants, multiplication and addition operations, while the regularity of the structure is also preserved.

Keywords: lattice–ladder filter, lattice–ladder multilayer perceptron, adap-tation, gradient adaptive lattice algorithms.

1 Introduction

During the last decade a lot of new structures for artificial neural networks (ANN) were introduced fulfilling the need to have models for nonlinear processing of time-varying signals [6, 7, 16, 17]. One of many and perhaps the most straight-forward way to insert temporal behaviour into ANNs is to use digital filters in a place of synapses of multilayer perceptron (MLP). Following that way, a time-delay neural network [18], FIR/IIR MLPs [1, 20], gamma MLP [8], a cascade neural network [4] to name a few ANN architectures, were developed.

A lattice–ladder realization of IIR filters incorporated as MLP synapses forms a structure of lattice–ladder multilayer perceptron (LLMLP) firstly in-troduced by [2] and followed by several simplified versions proposed by the author [9, 11]. A LLMLP is an appealing structure for advanced signal process-ing, however, even moderate implementation of its training hinders the fact that a lot of storage and computation power must be allocated [10].

(4)

Well known neural network training algorithms such as backpropagation and its modifications, conjugate-gradients, Quasi-Newton, Levenberg-Marquardt, etc. or their adaptive counterparts like temporal-backpropagation [19], IIR MLP training algorithm [3], recursive Levenberg-Marquardt [13], etc., essentially are based on the use of gradients (also called sensitivity functions)—partial deriva-tives of the cost function with respect to current weights. Here we are going to show how the exploration of specific to lattice–ladder filter (LLF) computation of gradients leads to efficient realization of overall family of LLMLP training al-gorithms. While doing so, we present quickest in terms of number of constants, multiplication and addition operations training algorithm for an extra reduced size LLMLP (XLLMLP).

2 Towards simplified computations

In order not to obscure main ideas, we will first work only with one LLF and afterwards in Section 4 generalize and utilize results in a development of training algorithm for a specific—XLLMLP structure. Although, the final LLF training algorithm is fairly simple, many intermediate steps involved in the derivation could be confusing. Thus, here we will present previous results on calculation of LLF gradients, introduce the simplifications we have chosen, and only in next Section 3 actually derive all simplified expressions.

Consider one lattice–ladder filter (see Figure 1 in next page) used in XLLMLP structure, when its computations are expressed by

· fj−1(n) bj(n) ¸ = · cos Θj − sin Θj sin Θj cos Θj ¸ · fj(n) zbj−1(n) ¸ , j = 1, 2, . . . , M ; (1a) sout(n) = M X j=0 vj· bj(n), (1b)

with boundary conditions

b0(n) = f0(n); fM(n) = sin(n). (1c)

Here we used the following notations: sin(n) and sout(n) are signals at input and output of M th order LLF, fj(n) and bj(n) are forward and backward signals floating in jth section of LLF, Θj and vj are lattice and ladder coefficients correspondingly, while z is a delay operator such that zbj(n) ≡ bj(n − 1).

It could be shown (see, for example, [5]) that gradient expressions for the calculation of LLF coefficients require M recursions, yielding a training algo-rithm with total complexity proportional to M2_{. One possible way to simplify} the LLF gradients calculation was presented in [15].

We assume that the concept of flowgraph transposition is already known (if not—consult, for example, [14, pages 291–293]). Applying the flowgraph transposition rules to the LLF equations (1a) and (1b), we obtain the LLF transpose realization. The resulting system gives rise to the following recurrent

(5)

replacemen sin(n) sout(n) f1(n) f0(n) bM(n) b1(n) b0(n) cos ΘM cos ΘM sin ΘM − sin ΘM cos Θ1 cos Θ1 sin Θ1 − sin Θ1 vM v1 v0 z z Lattice Ladder

Figure 1: A lattice–ladder filter of order M . Input signal sin(n) is processed according to (1) using lattice Θx and ladder vx coefficients and as a result output signal sout(n) is obtained. Some forward fx(n) and backward bx(n) signals floating inside of LLF are shown while by z we indicate a delay operator. relation · gj(n) tj−1(n) ¸ = · 1 z ¸ · cos Θj sin Θj − sin Θj cos Θj ¸ · gj−1(n) tj(n) ¸ (2a) + · 0 vj−1 ¸ , j = M, . . . , 2, 1, with boundary conditions

tM(n) = vM, g0(n) = t0(n), gM(n) = sout(n). (2b) Here by gj(n) and tj(n) we indicated forward and backward signals floating in jth section of transposed LLF.

After simple re-arrangements (see, for example, [10, pages 51–54]) it could be shown that filtered regressor components alternatively could be expressed as

∇Θj(n) =

fj(n)tj(n) − bj(n)gj(n) cos Θj

e(n). (3)

Here e(n) is a LLF error.

The main idea given by J. R.-Fonollosa and E. Masgrau towards simplifying the gradient computations for lattice parameters is to find a recurrence relation that realizes the mapping

· f_j−1(n)t_j−1(n) b_j−1(n)g_j−1(n) ¸ −→ · fj(n)tj(n) bj(n)gj(n) ¸ , (4)

in such a way that all the necessary transfer functions may be obtained from one single filter resulting in an algorithm with total complexity proportional to M .

(6)

3 Derivation of simplified calculation of LLF

gra-dients

There are in total 14 possible and implementable ways of realization of map-ping (4). The optimal way in a sense of minimal number of constants, addition and delay operations involved in the calculations of the gradients for LLF, and also regularity of the regressor lattice structure, was reported in [12]. It forms the basis for new training algorithm to be developed in next section. Thus, let us show in the step-by-step fashion the re-arrangements involved in the derivation of optimal realization of mapping (4).

Accordingly, we are seeking a simple realization for the following optimal augmented mapping     fj−1(n)gj−1(n) bj−1(n)gj−1(n) fj(n)tj(n) bj(n)tj(n)     −→     fj(n)gj(n) bj(n)gj(n) fj−1(n)tj−1(n) b_j−1(n)t_j−1(n)     , (5)

while the overall system describing single section of regressor lattice is expressed by · fj−1(n)gj−1(n) bj(n)gj−1(n) ¸ = · cos Θj − sin Θj sin Θj cos Θj ¸· fj(n)gj−1(n) zbj−1(n)gj−1(n) ¸ ; (6a) · fj−1(n)tj−1(n) bj(n)tj−1(n) ¸ = · cos Θj − sin Θj sin Θj cos Θj ¸· fj(n)tj−1(n) zbj−1(n)tj−1(n) ¸ ; (6b) · fj(n)gj(n) fj(n)tj−1(n) ¸ = · 1 z ¸· cos Θj sin Θj − sin Θj cos Θj ¸· fj(n)gj−1(n) fj(n)tj(n) ¸ (6c) + · 0 vj−1 ¸ fj(n); · bj(n)gj(n) bj(n)tj−1(n) ¸ = · 1 z ¸· cos Θj sin Θj − sin Θj cos Θj ¸· bj(n)gj−1(n) bj(n)tj(n) ¸ (6d) + · 0 vj−1 ¸ bj(n).

In order to achieve mapping (5) two information flow directions in (6) must be reversed: now fj(n)gj−1(n) must be computed based on fj−1(n)gj−1(n), and bj−1(n) ×tj−1(n) must be computed based on bj(n)tj−1(n).

For the first re-direction to be fulfilled we take (6a) and re-arrange it as follows · fj(n)gj−1(n) bj(n)gj−1(n) ¸ = · 1 0 0 z−1 ¸     1 cos Θj −sin Θj cos Θj −sin Θj cos Θj 1 cos Θj     · fj−1(n)gj−1(n) bj−1(n)gj−1(n) ¸ . (7a)

Similarly, taking (6b) and re-arranging we get · fj−1(n)tj−1(n) bj−1(n)tj−1(n) ¸ = · 1 0 0 z−1 ¸     1 cos Θj −sin Θj cos Θj −sin Θj cos Θj 1 cos Θj     · fj(n)tj−1(n) bj(n)tj−1(n) ¸ . (7b)

(7)

For the convenience we rewrite unchanged expressions (6c) and (6d) here · fj(n)gj(n) fj(n)tj−1(n) ¸ = · 1 z ¸· cos Θj sin Θj − sin Θj cos Θj ¸· fj(n)gj−1(n) fj(n)tj(n) ¸ (7c) + · 0 vj−1 ¸ fj(n); · bj(n)gj(n) bj(n)tj−1(n) ¸ = · 1 z ¸· cos Θj sin Θj − sin Θj cos Θj ¸· bj(n)gj−1(n) bj(n)tj(n) ¸ (7d) + · 0 vj−1 ¸ bj(n).

Now, (7) describes a system that has no conflicting information flow di-rections. However, there are two advance operations to be performed in (7a) and in (7b), making the system non-causal, hence un-implementable. There is, however, more to this computation than first meets the eye.

Notice first, the mapping (5) needs only four expressions to be specified. Thus a simplification of (7) becomes plausible. It could be shown that (7) could be simplified drastically and finally expressed by

    fj(n)gj(n) bj(n)gj(n) fj−1(n)tj−1(n) bj−1(n)tj−1(n)     =     1 0 0 0 0 1 0 0 0 0 z 0 0 0 0 1     ×    I4+     sin Θj 0 0 0 0 sin Θj 0 0 0 0 − sin Θj 0 0 0 0 − sin Θj         0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0         ×     1 0 0 0 0 z 0 0 0 0 1 0 0 0 0 1         fj−1(n)gj−1(n) bj−1(n)gj−1(n) fj(n)tj(n) bj(n)tj(n)     + vj−1     0 0 fj−1(n) b_j−1(n)     . (8a) The simplified system of single section of regressor lattice is already causal. Moreover, its implementation requires only 2 delay operators, 3 constants (sin Θj, − sin Θjand vj−1) and 8 addition operations (more evident from pictorial repre-sentation of (8) that we skiped to save space). In order to finish the derivation, the boundary conditions when M such systems are cascaded must be considered. Based on (1c) and (2b) we get such new boundary conditions

f0(n)g0(n) = f0(n)t0(n); b0(n)g0(n) = b0(n)t0(n); (8b) fM(n)tM(n) = vM; bM(n)tM(n) = vMbM(n).

4 Simplified training algorithm for extra reduced

size LLMLP

One way of LLMLP structure reduction could be achieved by restricting each neuron to have only one “output” lattice–ladder filter while connecting lay-ers through conventional synaptic coefficients. It yields an extra reduced size

(8)

sl−11 (n) sl−1i (n) sl−1_Nl₋₁(n) LLF1 LLFi LLFNl₋₁ ˜ sl 1(n) ˜ sl i(n) ˜ sl Nl₋₁(n) ˆ sl 1(n) ˆ sl h(n) ˆ sl Nl(n) Φ Φ Φ sl 1(n) sl h(n) sl Nl(n) {Θl_{, V}l_} [Nl₋₁_,Ml_] Wl [Nl₋₁_,Nl_] Φl(·)_[Nl_]

Figure 2: A layer of XLLMLP. Input signals sl−1

x (n) from a previous layer are processed according to (9) firstly by a group of lattice–ladder filters LLFx (see Figure 1 for more detailed treatment), then by static synapses wl

x,x, finally by neuron activation functions Φl_{(·) forming current layer output signals s}l

x(n). Emphasized intermediate signals: LLF output signals ˜sl

x(n) and signals before activation functions ˆsl

x(n). For the purpose of clearness size of main matri-ces (shown in brackets) are revealed at the bottom.

lattice–ladder multilayer perceptron structure (XLLMLP) thats pictorial repre-sentation is given on next page in Figure 2 and definition follows.

Definition 1 ([11]) A XLLMLP of size L layers, {N0_{, N}1_{, . . . , N}L_{} nodes and} filter orders {M1, M2, . . . , ML_{} is defined by}

slh(n) = Φ l (_Nl₋₁ X i=1 wlih Ml X j=0 vlij· b l ij(n) | {z } = ˜sl i(n) | {z } = ˆsl h(n) ) , h = 1, . . . , Nl, l = 1, . . . , L, (9a)

when local flow of information in the lattice part of filters is defined by · fl i,j−1(n) bl ij(n) ¸ = · cos Θl ij − sin Θ l ij sin Θl ij cos Θlij ¸ · fl ij(n) zbl i,j−1(n) ¸ , j = 1, 2, . . . , Ml_, _(9b) with initial and boundary conditions

bl

(9)

Here sl

h(n) is an output signal of the “output” neuron; ˜s l

i(n) is an output sig-nal of the filter that is connected to i “input” neuron; wl

ih represents single (static) weight connecting two neurons in a layer; Θl

ij and vlij are weights of filter’s lattice and ladder parts correspondingly; bl

ij(n) and fijl(n) are forward and backward prediction errors of the filter; i, j, h and l index inputs, filter coefficients, outputs and layers respectively.

Let us consider calculation of sensitivity functions using backpropagation algorithm for XLLMLP lth hidden layer neurons. It could be shown [11] that sensitivity functions for XLLMLP could be expressed by

∇wl ih(n) = ˜s l i(n) δ l h(n); (10a) ∇vl ij(n) = b l ij(n) Nl X h=1 δl h(n); (10b) ∇Θlij(n) = Ml X r=0 virl (n) ∂ ∂Θl ij blir(n) Nl X h=1 wlihδ l h(n). (10c)

Note, that these expressions are similar to standard LLF gradient expressions in a way that here we used additional indexes to state LLF position in the whole XLLMLP architecture (being precise, we showed expressions for sensi-tivity functions for jth coefficients of LLF connecting ith input neuron with hth output neuron in lth layer of LLMLP) and replaced output error e(n) term by the generalized local instantaneous error, i.e., δl

h(n) = ∂E(n)/∂ˆslh(n), that could be explicitly expressed by

δl h(n) =      −el h(n)Φ0L © ˆ sl h(n) ª , l = L, Φ0l©_s_ˆl h(n) ªMl+1 P j=0 v_hjl+1γ_hjl+1(n) Nl+1 P p=1 wl+1_hp δl+1 p (n), l 6= L, (10d)

with γhjl+1(n) and its companion ϕ l+1 hj (n) by " ϕl+1_h,j−1(n) γhjl+1(n) # = " cos Θl+1_hj − sin Θl+1_hj sin Θl+1hj cos Θ l+1 hj #" ϕl+1_hj (n) zγ_h,j−1l+1 (n) # . (10e)

Aiming to present simplified training of XLLMLP we replace (10c) with

∇Θl ij(n) = fl ij(n)t l ij(n) − b l ij(n)g l ij(n) cos Θl ij | {z } = b∇Θlij(n) Nl X h=1 wl ihδlh(n), (11)

(10)

where order recursion computation is done in a way of (8) by     fl ij(n)glij(n) bl ij(n)g l ij(n) fl i,j−1(n)tli,j−1(n) bl i,j−1(n)tli,j−1(n)     =     1 z sin Θl ij sin Θlij 0 sin Θl ij z 0 sin Θ l ij −z sin Θl ij 0 z −z sin Θlij 0 −z sin Θl ij − sin Θlij 1     ×     fl i,j−1(n)g l i,j−1(n) bl i,j−1(n)gli,j−1(n) fl ij(n)tlij(n) bl ij(n)tlij(n)     + vli,j−1     0 0 fl i,j−1(n) bl i,j−1(n)     , (12a)

with boundary conditions: fl

i0(n)gi0l (n) = fi0l(n)tli0(n); bli0(n)gli0(n) = bli0(n)tli0(n); (12b) fl

iMl(n)tl_iMl(n) = v_iMl l; bl_iMl(n)tl_iMl(n) = bl_iMl(n)v_iMl l. (12c) Full XLLMLP training algorithm is presented at the end of the article. Let us here only summarize main steps of it:

1. XLLMLP recall. Given input pattern s0

i(n) is presented to the input layer (see line 1 of the algorithm). Afterwards it is processed through XLLMLP in a layer by layer fashion (lines 2–14): first by corresponding lattice–ladder filters (lines 3–9), then by static synapses (lines 10–13). In that way XLLMLP output pattern sL

h(n) is obtained.

2. Node errors. Using rezults of previous calculations, errors of output layer neurons are determined (line 15). Then, again in a layer by layer fashion— however this time in the backward direction—these errors are processed through complete XLLMLP (lines 16–23). Following that way errors of hidden layer neurons δl

h(n) are calculated.

3. Gradient terms. Simplified calculation of gradient terms for the lattice weight updates is done as in (12). Let us here be more specific. Working in a layer by layer fashion and taking filter one by one (lines 24–38; note however that processing order here is insignificant), algorithm proceeds as follow: at line 25 initial conditions are assigned as in (12c); in lines 26– 29 two lower equations from (12a) are evaluated for all sections of the filter; at line 30 left boundary conditions are fulfilled as in (12b); for all sections of the filter (lines 31–37) by lines 32–35 remaining upper equations from (12a) and in line 36 also b∇Θl

ij(n) from (11) are evaluated.

4. Weight updates. Dynamic synapses (LLFs) are updated at lines 41 and 42 realizing expressions (10b) and (11), while static ones—at line 44 realiz-ing (10a) and takrealiz-ing into account their previous values, trainrealiz-ing parame-ter µ, node errors and gradient parame-terms.

5. Stability test. New parameter values of lattice filters are checked if they satisfy stability requirement (see lines 46–48), if some of them do not satisfy—old parameter values are substituted at line 47.

(11)

5 Conclusions

In this paper we dealt with the problem of computational efficiency of lattice– ladder multilayer perceptrons training algorithms that are based on the compu-tation of gradients, for example, backpropagation, conjugate-gradient or Levenberg-Marquardt. Here we explored calculations that are most computationally de-manding and specific to lattice-ladder filter—computation of gradients for lat-tice (rotation) parameters.

The optimal way in a sense of minimal number of constants, addition and de-lay operations involved in aforementioned computations for single lattice-ladder filter, and also regularity of the regressor structure, was found recently by the author. Based on it quick training algorithm for the extra reduced size lattice– ladder multilayer perceptron was here derived.

Not surprisingly, presented algorithm requires approximately M times (where M is the order of filter) less computations while it follows exact gradient path (be-cause of the fact that derivations do not involved any approximations) when coefficients of filters are assumed to be stationary. More importantly, incorpo-rated in the algorithm implementation of a single section of regressor lattice will require only 2 delay elements, 3 constants and 8 addition operations.

Experimental study of the proposed algorithm was not considered because of the fact that computations of gradients are exact and possible comparison inherently will be biased depending on the chosen way of implementation.

Acknowledgements

Research was supported by The Royal Swedish Academy of Sciences and The Swedish Institute New Visby project Ref. No. 2473/2002(381/T81).

References

[1] A. D. Back and A. C. Tsoi. FIR and IIR synapses, a new neural network architecture for time series modeling. Neural Computation, 3:375–385, 1991. [2] A. D. Back and A. C. Tsoi. An adaptive lattice architecture for dynamic

multilayer perceptrons. Neural Computation, 4:922–931, 1992.

[3] A. D. Back and A. C. Tsoi. A simplified gradient algorithm for IIR synapse multilayer perceptrons. Neural Computation, 5:456–462, 1993.

[4] A. D. Back and A. C. Tsoi. A cascade neural network model with nonlinear poles and zeros. In Proceedings of 1996 International Conference on Neural Information Processing, volume 1, pages 486–491. IEEE Press, 1996. [5] S. Haykin. Adaptive Filter Theory. Prentice-Hall International, Inc., 3rd

edition, 1996.

[6] S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice-Hall, Upper Saddle River, NJ, 2nd edition, 1999.

(12)

[7] A. Juditsky, H. Hjalmarsson, A. Benveniste, B. Delyon, L. Ljung, J. Sj¨oberg, and Q. Zhang. Nonlinear black-box modeling in system identi-fication: Mathematical foundations. Automatica, 31:1725–1750, 1995. [8] S. Lawrence, A. D. Back, A. C. Tsoi, and C. L. Giles. The Gamma MLP

- using multiple temporal resolutions for improved classification. In Neural Networks for Signal Processing 7, pages 256–265. IEEE Press, 1997. [9] D. Navakauskas. A reduced size lattice-ladder neural network. In Signal

Processing Society Workshop on Neural Networks for Signal Processing, pages 313–322, Cambridge, England, 1998. IEEE Press.

[10] D. Navakauskas. Artificial Neural Network for the Restoration of Noise Distorted Songs Audio Records. Doctoral dissertation, Vilnius Gediminas Technical University, No. 434, September 1999.

[11] D. Navakauskas. Reducing implementation input of lattice-ladder multi-layer perceptrons. In Proceedings of the 15th European Conference on Cir-cuit Theory and Design, volume 3, pages 297–300, Espoo, Finland, 2001. Helsinki University of Technology.

[12] D. Navakauskas. Speeding up the training of lattice-ladder multilayer per-ceptrons. Technical Report LiTH-ISY-R-2417, Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden, 2002. [13] L. S. H. Ngia and J. Sjöberg. Efficient training of neural nets for nonlinear

adaptive filtering using a recursive Levenberg-Marquardt algorithm. IEEE Transactions on Signal Processing, 48(7):1915–1927, 2000.

[14] P. A. Regalia. Adaptive IIR Filtering in Signal Processing and Control. Marcel Dekker, Inc., 1995.

[15] J. A. Rodriguez-Fonollosa and E. Masgrau. Simplified gradient calculation in adaptive IIR lattice filters. IEEE Transactions on Signal Processing, 39(7):1702–1705, 1991.

[16] J. Sj¨oberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P.-Y. Glorennec, H. Hjalmarsson, and A. Juditsky. Nonlinear black-box modeling in system identification: A unified overview. Automatica, 31:1691–1724, 1995. [17] A. C. Tsoi and A. D. Back. Discrete time recurrent neural network

archi-tectures: A unifying review. Neurocomputing, 15:183–223, 1997.

[18] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acous-tics, Speech, and Signal Processing, 37(3):328–339, 1989.

[19] E. A. Wan. Temporal backpropagation for FIR neural networks. In Proceed-ings of International Joint Conference on Neural Networks, pages 575–580. IEEE Press, 1990.

[20] E. A. Wan. Finite Impulse Response Neural Networks with Applications in the Time Series Prediction. Doctoral dissertation, Department of Electrical Engineering, Stanford University, USA, November 1993.

(13)

XLLMLP Training Algorithm Available at time n.

New input pattern: s0

i(n), i = 1, 2, . . . , N0; Desired output pattern: dL

h(n), h = 1, 2, . . . , N L_; Static coefficients: wl ih(n), h = 1, 2, . . . , N l_; Ladder coefficients: vl ij(n), j = 0, 1, . . . , M l_; Lattice coefficients: Θl ij(n), j = 1, 2, . . . , Ml; Backward & forward signals: bl

ij(n), f l

ij(n), j = 0, 1, . . . , M l_; Gradients w.r.t. lattice weights:∇Θl

ij(n − 1), j = 1, 2, . . . , Ml; Regressor lattices signals: bgl

ij(n − 1), f tlij(n − 1), f glij, btlij, j = 0, . . . , Ml;

Temporal variable: temp;

Backward & forward gradients:ϕl

ij(n − 1), γijl (n − 1), j = 0, 2, . . . , Ml. 1. XLLMLP Recall. 1 Let f1 i,M1(n) = s 0 i(n). 2 forl = 1, 2, . . . , L do 3 fori = 1, 2, . . . , Nl−1 do 4 for j = Ml, . . . , 2, 1 do 5 · fl i,j−1(n) bl ij(n) ¸ = · cos Θl ij(n) − sin Θlij(n) sin Θl ij(n) cos Θlij(n) ¸· fl ij(n) bl i,j−1(n − 1) ¸ . 6 end for j 7 bl i0(n) = fi0l (n); 8 s˜l_i(n) = Ml+1 P j=0 vl ij(n)blij(n). 9 end fori 10 forh = 1, 2, . . . , Nl do 11 sˆl h(n) = N_Pl+1 i=1 wl ihs˜ l i(n); 12 sl h(n) = Φ l©_s_ˆl h(n) ª . 13 end forh 14 end forl 2. Node Errors. 15 eL h(n) = dh(n) − sLh(n), h = 1, 2, . . . , NL. 16 forl = L − 1, L − 2, . . . , 1 and h = 1, 2, . . . , Nl do 17 Let γl h,Ml(n) = 0. 18 forj = Ml, . . . , 2, 1 do 19 · ϕl h,j−1(n) γl hj(n) ¸ = · cos Θl hj(n) − sin Θlhj(n) sin Θl hj(n) cos Θ l hj(n) ¸· ϕl hj(n) γl h,j−1(n − 1) ¸ . 20 end forj

(14)

21 ϕl h0(n) = γh0l (n). 22 δl h(n) =      −el h(n)Φ0L © ˆ sl h(n) ª , l = L, Φ0l©_s_ˆl h(n) ªMl+1 P j=0 v_hjl+1γ_hjl+1(n) Nl+1 P p=1 wl+1_hp δl+1 p (n), l 6= L. 23 end forh, l 3. Gradient Terms. 24 forl = 1, 2, . . . , L and i = 1, 2, . . . , Nl−1 do 25 Let btl

i,Ml = vl_i,Ml(n)bl_ij(n), f tl_i,Ml= sl−1_i (n)vl_i,Ml(n). 26 forj = Ml, . . . , 2, 1 do 27 btl i,j−1= btlij− £ f tl ij(n) + bgli,j−1(n − 1) ¤ sin Θl ij(n) +vl i,j−1(n)b l i,j−1(n); 28 f tl i,j−1(n) = f t l i,j(n − 1) + v l i,j−1(n)f l i,j−1(n); 29 end forj 30 Let temp = btl i0, f gi0l = f tli0(n − 1). 31 forj = 1, 2, . . . , Ml do 32 f tl i,j−1(n) = f tlij(n − 1) − £ f gl i,j−1+ btlij ¤ sin Θl ij(n); 33 f gl_ij= f gl i,j−1+ £ bgl i,j−1(n − 1) + f tlij(n) ¤ sin Θl ij(n); 34 bgl i,j−1(n) = temp; 35 temp = bgl i,j−1(n − 1) + £ f gl i,j−1+ bt l ij ¤ sin Θl ij(n); 36 ∇Θb l ij(n) = £ f tl ij(n) − temp ¤ / cos Θl ij(n). 37 end forj 38 end fori, l 4. Weight Updates. 39 forl = 1, 2, . . . , L and i = 1, 2, . . . , Nl−1 do 40 forj = 1, 2, . . . , Ml do 41 vl ij(n + 1) = vlij(n) + µblij(n) Nl P h=1 δl h(n); 42 Θl ij(n + 1) = Θlij(n) + µ b∇Θlij(n) Nl P h=1 wl ihδlh(n). 43 end forj 44 forh = 1, 2, . . . , Nl dowl ih(n + 1) = w l ih(n) + µδ l h(n)˜s l i(n). 45 end fori, l 5. Stability Test.

46 forl = 1, 2, . . . , L and i = 1, 2, . . . , Nl−1 and j = 1, 2, . . . , Ml do

47 if |Θl ij(n + 1)| > π/2 then Θ l ij(n + 1) = Θ l ij(n). 48 end forj, i, l