• No results found

Speech Signal Extraction: A Multichannel Approach

N/A
N/A
Protected

Academic year: 2022

Share "Speech Signal Extraction: A Multichannel Approach"

Copied!
95
0
0

Loading.... (view fulltext now)

Full text

(1)

A Multichannel Approach Nedelko Grbi´c

Ronneby, November 1999

(2)
(3)

A Multichannel Approach

Nedelko Grbi´c

Ronneby, November 1999

Department of Telecommunications and Signal Processing University of Karlskrona/Ronneby,

S-372 25 Ronneby Sweden

(4)

ISBN: 91-630-8841-X Published 1999

Printed byLund University, Reprocentralen Lund 1999

(5)

Preface

This Licentiate thesis summarizes mywork in the field of speech signal ex- traction. The work is mainlyaimed for speech enhancement in communication systems such as conference telephony and handsfree mobile telephony. The work has been carried out at Department of Telecommunications and Signal Processing at Universityof Karlskrona/Ronnebyin collaboration with Ericsson Mobile Com- munications. The thesis consists of three parts:

N. Grbi´c, J. Nordberg, S. Nordholm, “Subband Acoustic Echo Cancelling using LMS and RLS,” Research Report 1999:5, ISSN: 1103-1581, Universityof Karl- skrona/Ronneby.

N. Grbi´c, M. Dahl, I. Claesson, “Acoustic Echo Cancelling and Noise Suppres- sion with Microphone Arrays,” Research Report 1999:4, ISSN: 1103-1581, Univer- sityof Karlskrona/Ronneby.

N. Grbi´c, X. J. Tao, S. Nordholm, I. Claesson, “Blind Speech Signal Separation using Overcomplete Subband Representation,” submitted to IEEE Transactions on Speech and Audio Processing, Nov. 1999.

Parts of the papers have been presented as:

M. Dahl, I. Claesson, S. Nordholm, N. Grbic, “Adaptive Microphone ArraySys- tem for Speech Enhancement,” In Proc. COST 254 Second Workshop, Toulouse, France, July97.

N. Grbi´c, M. Dahl, I. Claesson, “Neural Network Based Adaptive Microphone ArraySystem for Speech Enhancement,” 1998 IEEE World Congress on Compu- tational Intelligence, Anchorage, Alaska, USA, May1998

(6)
(7)

Acknowledgments

I would like to thank all colleagues at Department of Telecommunications and Signal Processing for the nice atmosphere theyall have created for myresearch studies. I would especiallylike to thank mysupervisors and dear friends, Prof.

Sven Nordholm and Prof. Ingvar Claesson for valuable mentorship and guidance in myresearch as well as in mystudies. I wish to thank mydear mentor and closest colleague and also mydear friend Dr. Xiao-Jiao Tao for intense collaboration in myresearch studies.

Mythanks also goes to M.Sc. TimothySamuels and Dr. Abbas Mohammed for their help and careful proof reading of the manuscript.

I am in debt to myfamilyand all myfriends for their support during mystudies.

Finally, I express mygratitude to mybeloved fianc´ee Marina for her understanding and comfort during mystudies.

Nedelko Grbi´c

Ronneby, January 1999

(8)
(9)

Contents

Introduction . . . 1 Part

A. Subband Acoustic Echo Cancelling using LMS and RLS . . . 7-34 B. Acoustic Echo Cancelling and Noise Suppression

with Microphone Array s . . . 37-62 C. Blind Speech Signal Separation

using Overcomplete Subband Representation . . . 65-85

(10)
(11)

Introduction

In speech signal extraction the aim is to extract human speech in a physical environment byusing microphones. In anyreal world environment there are many disturbance sources that cause unwanted sound, which in turn maydegrade the comprehension of the wanted speech at the microphones. These disturbances vary depending on the environmental preliminaries. Theymay, for instance, consist of indoor ventilators, computer fans and other disturbing noise sources. One way to reduce the environmental noise is to passivelycover each source with sound absorbing material. There are, however, several drawbacks with this approach.

First, there are often several minor sources that individuallycause small speech degradations, but when added together can cause severe degradation. Secondly, not all sources of disturbance are originating from physical devices; humans speaking in the background can cause disturbing noise. Obviously, it can be inconvenient to cover all disturbing sources.

There are other approaches that can be used for extracting the desired speech.

At the generated microphone signals an analog-to-digital conversion is performed and digital filters are connected. Furthermore, byusing digital signal processing techniques one can extract the speech signal from the disturbing environment by the appropriate design of these filters. The fundamental principle, which makes this extraction possible, relies on the fact that the physical properties and the location of the speech are different from most of the noise sources. This implies that both spatial and temporal information maybe used in the extraction process. Depending on the nature of the surrounding noise environment and the inferring sources, there are different approaches, which can extract speech successfully. In this thesis, the formulation of the problem is divided into three different perspectives in which manyreal world applications can be classified.

• One major disturbing and known source is to be cancelled from the desired speech.

• Several disturbing and unknown sources are present but information about the spatial location of the desired speech is known.

• Several disturbing and unknown sources are present and no information about the spatial location of the desired speech is available.

The thesis consists of three parts, where each part deals with the problem from the different perspectives.

Part A

The first case appears in a low noise handsfree situation. The user of a handsfree set leads a conversation with a person at the far end of a communication link. The user

(12)

hears the other person from a loudspeaker and a microphone is used to gather the user’s speech. Since both the loudspeaker and the microphone often are located close to the speaker, the microphone will also sense the speech originating from the loudspeaker. This effect will cause the communication system to send some of the speech back to the far end user, which appears as an echo. Speech signal extraction in this context is often referred to as “acoustic echo cancelling”. The disturbing signal is known but has passed through an unknown channel. In order to subtract the disturbance from the microphone signal, the echo canceller should perform an accurate and adaptive channel estimation. The degree of difficultyof the estimation precess depends highlyon the room characteristics. For example, size and shape of the room and the material of the walls are such factors, while modifications of furniture inside the room as well as people’s movement will change these characteristics and in turn lead to a need of a re-estimation process.

The first part of this thesis deals with this estimation problem. Two different situations are evaluated; a car cabin and a conference room. The focus is on evaluation of a delayless room estimation performed in frequency subbands and an echo cancelling performed in time domain. It is shown that for large room, such as the conference room the resulting echo cancellation has been improved as compared with conventional time domain techniques, in terms of amount of cancellation and the speed of channel estimation. It is important for the echo canceller to do the channel estimation accuratelyin a short time in order to initiallyadapt and track room condition variations.

Part B

The second case also takes place in the handfree situation when the surrounding noise situation is of a complex structure and each noise source cannot be discerned individually. The handfree situation in an automobile is such an example. The noise situation is made up from several fundamentallydifferent sources. Examples of such noise sources are wind and tire friction, fan and engine noise travelling over mechanical structures. One wayto perform speech signal extraction in this situation is to allow for spatial selectivity, i.e. directional hearing. By using several microphones separated in space, this can be accomplished. The principle follows from the fact that signals from different locations will impinge on the microphones at different time instances. Since digital filters have the the abilityof delaying sig- nals arbitrarilyone can design filters in such a waythat onlysignals from desired direction will pass the system. Bysteering the microphone arraytowards the per- son speaking one can perform speech signal extraction in situations where several noise sources are present. Temporal information can simultaneouslybe taken into consideration and therebyallow for discrimination of noise sources from the same direction as the speech source, provided it bears a different spectral content.

The second part of this thesis deal with the problem of extracting a single speaker in an automobile in the application of handsfree communication. This is

(13)

done byplacing several microphones along a line in front of the driver. Since the driver keeps the same position, or onlyalters his position slightly, one mayuse this information to steer the hearing towards the correct position. Once the direction is known, there are manyways to find optimal filters, where the positions of the microphones must be placed with high accuracy, or carefully calibrated. This will lead to high cost of the equipment and it will also cause the filters to relysolely on spatial diversity. An alternative approach is evaluated in this thesis. The core of the operation is that a signal with human speech properties, and additional known disturbances, are emitted from their respective position and recorded. This recorded information is then used in an adaptive manner to extract the speech from the physical environment and simultaneously suppress all the unwanted noise sources. Recording of signals from the real environment gives information of micro- phone placement and channel propagation properties and provides more flexibility when installing a system.

Part C

The third case is the most general one encounters in a situation where manysources are present and some (or all) of the sources are to be extracted. The positions of the sources are considered unknown and the principle assumes no access to the properties of disturbing or desired sources. The extraction relies on information theoretic measures of the microphone signals. The problem is undetermined in the sense that exact restoration of the original sources can not be accomplished.

This is deduced from the fact that both the sources and the channels theyhave passed through are unknown. This, in turn, will lead to indeterminacyof scale and permutation; when the speaker lowers his voice and the channels provides less attenuation, the exact same signal will appear at the microphone. If two sources trade places but at the same time the microphones are shifting places, the same output signals will appear. Nevertheless, the sources can still be extracted from the environment, but without restoration of the exact scale and permutation. The problem can be viewed as an inverse multichannel estimation process and it is often referred to as “blind signal separation”.

The third part of this thesis deals with the problem of two persons speaking si- multaneouslyin a conference room and two microphones receive the speech signals.

The extraction is performed in time domain with linear filters whereas the inverse channel estimation is performed in the frequencydomain. Both artificial and real world scenarios are evaluated and compared. The extension of the principle to more sources and microphones is straight forward.

(14)
(15)

Subband Acoustic Echo

Cancelling using LMS and RLS

(16)

Universityof Karlskrona/Ronneby.

(17)

Subband Acoustic Echo Cancelling using LMS and RLS

Nedelko Grbi´ c, J¨ orgen Nordberg, Sven Nordholm University of Karlskrona/Ronneby

Department of Signal Processing Sweden

Abstract

The increasing use of modern hands free communication systems such as video conferencing, computer communications, and vehicle mounted cellular telephones brings the demand for high-quality acoustic echo cancellation up to focus. In these applications the echo path which has to be identified typically has long time duration, the order of 100 ms. For this identification the length of the filter will be long.

This report evaluates the Normalized Least Mean Square (NLMS) and the Weighted Recursive Least Square (WRLS) algorithms for acoustic echo cancelling using a delayless subband scheme. Subband signal processing has shown to be efficient both when it comes to convergence rate and level of echo suppression.

The evaluation is performed for real speech signals sampled from a conver- sation using a hands free set mounted in an automobile, and a conversation using conference telephony equipment in a conference room. A compari- son of subband and fullband algorithms is made both with respect to the computational cost and level of echo suppression.

Results show that when the impulse response is very long, i.e. in such environments as conference rooms, the subband approach is beneficial. In a car environment the size of enclosure and damping means that the response is quite short and a conventional echo canceller could perform as well as a subband echo canceller. In the study, finite word length effects have not been considered.

The LMS algorithm can perform as well as the RLS algorithm when implemented in the subband scheme and using an energy detector. The computational cost is reduced substantially for the RLS algorithm when implemented in subbands, while keeping most of its performance.

(18)

1 Introduction

In modern hands free communication systems such as hands free car phones, loud- speaker phones and video conferencing systems, it is necessary to perform an acous- tic echo cancellation of the far-end speaker [2, 3, 4]. The echo cancellation system is made adaptive in order to track variations in the acoustic channel. The filter length of the acoustic canceller is typically 500-1500 FIR taps for normal sampling frequencies. Long filters imply a large computational burden and slow convergence rate. The slow convergence rate is especially obvious in signals with a large spectral dynamic range such as speech signals. A subband echo canceller [5, 6] gives several advantages when compared to a fullband echo canceller such as:

1. The computational burden is essentially reduced by the number of subbands due to decimation.

2. A faster convergence since the spectral dynamic range in each subband will be less.

3. A signal controlled adaptation can be performed in each subband individually, hence enhanced performance.

4. A well separated structure for parallel implementation.

This paper evaluates a version of a delayless subband adaptive filter presented by Morgan and Thi [6]. The evaluation is performed for speech signals where the suppression and the convergence are compared for the Normalized Least Mean Square (NLMS) algorithm and the Weighted Recursive Least Square (WRLS) al- gorithm. The evaluation also includes the use of a simple energy detector in the subbands.

2 System Overview

An acoustic echo canceller, see Figure 1, identifies the channel between the loud- speaker and the hands free microphone. This identified impulse response is then employed to achieve a suppression of the echo. One of the fundamental character- istics of this channel is the bulk delay. A typical distance between loudspeaker and microphone is 1 m. This separation corresponds to a 3 ms delay and with 8-12 kHz sample frequency this corresponds to about 20-30 samples. However, an FIR filter with 50 taps will only characterize the direct wave and give a suppression of about 5-10 dB. In order to achieve the suppression goal which is 30-40 dB, filter lengths of 500-2000 FIR taps become necessary. The filter should also be able to track variations in the acoustic environment. An appealing approach is to use a multirate technique since such a technique reduces the computational burden and also gives a faster convergence rate. The latter is due to the reduction of spectral

(19)

dynamic range in each subband. Since the identification of the acoustic path must be done on the basis of speech signals the spectral range plays an important role in the final performance. A major drawback is the delay which is introduced by the filter bank. This delay can, however, be circumvented by using a modified structure for the subband adaptive filter [6]. In conventional subband structures the delay introduced by the filter bank acts on the signals as well as on the adaptation, in this modified structure only the adaptation is affected by this delay.

2.1 The Delayless Subband Adaptive Filter

The delayless attribute of this technique comes from the fact that the new adaptive weights are computed in subbands and then transformed to an equivalent fullband filter by means of an inverse FFT, see Figure 1. The filter works in real time on the loudspeaker signal. The coefficients are calculated separately in each band.

They can be calculated either by employing the error signal e(k) (closed loop case) or the microphone input signal d(k) (open loop case). When the signal d(k) is used, a local error signal in each band is created. In this case the calculations do not need to be performed in real time. This approach will, however, give less final suppression since the algorithm is working blindly with respect to the real error signal. The fullband signal is divided into several subband signals by using a polyphase FFT technique [7].

2.2 Polyphase FFT Filter Bank

A set of M filters is said to be a uniform DFT filter bank if they are related as Hl(z) = H0(zWl) =

 n=−∞

h0(n)(zWl)−n, (1)

where W = e−j2π/M and l∈ [0, M − 1]. The polyphase decomposition can be used to implement such a filter bank in an efficient manner [7]. The number of filters in the filter bank is M, thus the passband frequency of the prototype filter should be set to 2M1 . Since only fullband filters with real coefficients are considered, it is enough to calculate M2 + 1 complex subband signals. In order to reduce aliasing, the signals in the filter bank are decimated by a factor of only M2 . The polyphase decomposition of the DFT filter bank is performed accordingly. The resulting filters after decimation will have passbands centered at dc for even subbands, while passbands for odd subbands will be centered at 12 , see Figure 2.

The prototype filter H0(z) is polyphase decomposed as

H0(z) =

 n=−∞

h0(n)z−n =

M/2−1

m=0

z−m

 n=−∞

h0(nM

2 + m)z−nM/2. (2)

(20)

+

hp

Poly - phase FFT

M/2

Poly - phase FFT

M/2 Algorithm

yo y1

yM/2

d0/e0 FFT

Band 0

FFT Band 1

FFT Band M/2 Frequency Stacking and Conjugate Complement

IFFT h

x[n] d[n] e[n]

-

d1/e1

dM/2/eM/2 A

B

Algorithm

Algorithm

Figure 1: Delayless subband acoustic echo canceller; position A open loop configu- ration and position B closed loop configuration.

An arbitrary filter in the filter bank Eq. (1 ) and (2) yields,

Hl(z) =

 n=−∞

h0(n)(Wlz)−n =

M/2−1

m=0

(Wlz)−m

 n=−∞

h0(nM

2 + m)(Wlz)−nM/2. (3) where

W−lnM/2 = (ejπl)n =

 (−1)n l odd

1 l even (4)

Eq. (4) indicates that odd and even subbands are treated slightly differently.

For odd l Eq. (3) yields,

Hl(z) =

M/2−1

m=0

(Wlz)−m

 n=−∞

h0(nM

2 + m)(−1)nz−nM/2 (5)

(21)

Figure 2: Filter bank response for even and odd subbands after decimation.

defining Em (z) as

Em (z) =

 n=−∞

h0(nM

2 + m)(−1)nz−n. (6)

Then Eq. (5) can be rewritten as

Hl(z) =

M/2−1

m=0

(Wlz)−mEm (zM/2). (7) For even l Eq. (3) yields

Hl(z) =

M/2−1

m=0

(Wlz)−m

 n=−∞

h0(nM

2 + m)z−nM/2 (8)

defining Em(z) as

Em(z) =

 n=−∞

h0(nM

2 + m)z−n. (9)

Then Eq. (8) can be rewritten as

Hl(z) =

M/2−1

m=0

(Wlz)−mEm(zM/2). (10) This means that the polyphase filter bank can be divided into two filter structures:

one for even subbands and one for odd subbands, see Figure 3

(22)

XM/2-2(n) x(n)

z-1

z-1

z-1

E0(Z )

E1(Z)

EM/2-1(Z)

W*

D

D

D

X0(n)

X2(n)

XM/2-1(n) x(n)

z-1

z-1

z-1

E'0(Z )

E'1(Z)

E'M/2-1(Z)

W*

D

D

D

X1(n)

X3(n)

Figure 3: A filter bank design with polyphase FFT where even and odd subbands are calculated separately.

2.3 Transformation of subband filter coefficients to full- band filter coefficients

If the fullband filter has L taps the filter length in each subband will be DL, D = M2 . An DL point FFT will be calculated based on the adaptive weights in each subband.

These subband weights are subsequently stacked to form an L/2 element array, [1, 2, ...L2]. The array is then completed by setting the element indexed L/2 + 1 to zero and using the complex conjugate of elements [2, 3, ...L2] in reverse order.

Finally, the new L element array is transformed by an L point inverse FFT to obtain the fullband filter weights.

The rule for this transformation in the FFT-domain can be described as follows.

Denote the fullband filter FFT bins as Hp(k) and the i:th subband filter FFT bins as Hsi(n), where i = {0, 1, 2, . . . M/2}, n = {1, 2, . . . , L/D} and k = {1, 2, . . . L}.

By observing Figure 2 the relation between the fullband and the subband frequency mapping can be determined. Since FFT is used, the transformation rule becomes a stacking procedure according to the following:

(23)

Hp(k) =

Hs0(k mod2ML ), 1≤ k ≤ 2ML

i odd { Hsi(k mod2ML ), (2i− 1)2ML + 1≤ k ≤ (2i + 1)2ML i even

 Hsi(k mod2ML + 32ML ), k 2iL2M

Hsi(k mod2ML ), k > 2M2iL (2i− 1)2ML + 1≤ k ≤ (2i + 1)2ML HM2 (k mod2ML ), (M − 1)2ML + 1≤ k ≤ L2

where i is a index determined by i = f loor

kM L +1

2 1 2M



. Floor means the closest integer smaller than the argument.

Now, since the fullband FIR filter is real valued and the FFT operator is defined by discretized frequencies in the range of [0, 2π], the conjugate is taken in the reverse order to determine the mirror part of Hp(k) as:

Hp(k) = conj{Hp(L− k + 1)}, f or L

2 + 2≤ k ≤ L and

Hp(L

2 + 1) = 0.

The fullband time-domain representation is determined by hp(n) = IF F T{Hp(k)}.

3 Structure Evaluation

The delayless subband echo canceller is evaluated by using the Normalized Least Mean Square (NLMS) algorithm and the Weighted Recursive Least Square (WRLS) algorithm in the subbands according to Figure 1. The suppression ratio is evaluated for the acoustic response in a situation using a car-mounted mobile hands free set and for the response in a conference room environment. The performances are compared with the conventional fullband implementation for the same two environments.

3.1 Least Mean Square versus Recursive Least Square

It is well known that the NLMS algorithm has low complexity, but slower conver- gence and higher excess mean square error when compared to the WRLS algorithm [8]. These characteristics are not always the case when dealing with speech signals as presented in the simulations. The NLMS performance can in some cases be boosted by introducing an energy detector, as described in Section 3.3.

The NLMS algorithm will be referred to as LMS and the WRLS will be referred to as RLS in the following.

(24)

3.2 Fullband versus Subband

The problem at hand is to identify the acoustic channel in the region of frequen- cies where the input signal has energy. The behavior of the identification in the frequency regions with no excitation can be arbitrary while still yielding high per- formance in the cancellation. These regions have, of course, some energy due to finite time effects, but the magnitude is small. The ratio between the highest and the lowest regional magnitude gives a measure of the condition of the problem.

Lower ratio gives better condition [9].

In the fullband realization of the identification using RLS, the solution is very unstable due to the ratio’s being very high between the maximal and the minimal singular value of the correlation matrix [10]. Figure 4 shows the singular values of the estimated autocorrelation matrix, Rxx, of the input signal to the echo canceller in the car hands free situation. The solid line shows the maximum accuracy for de- termining the inverse of this matrix for the method used. The inverse is calculated as a pseudo inverse where the singular values of magnitude below this accuracy are discharged. This value is chosen such that the instability for the inversion will not be increased due to quantization.

0 100 200 300 400 500 600

−350

−300

−250

−200

−150

−100

−50 0

Eigenvalue plot for estimated autocorrelation matrix

Eigenvalue No. (in ascending order)

Eigenvalue Magnitude [dB]

Figure 4: Eigenvalue plot of autocorrelation matrix using Fullband Scheme; dashed lines shows minimal eigenvalue possible for pseudo inverse.

This result shows that the fullband identification problem is ill-conditioned.

Regarding the problem instead as several subband identification problems will re- sult in several individually well-conditioned problems. Figure 5 shows the singular values of the first 9 subband estimated correlation matrixes for the 16-subband implementation. Since the input signal is real valued there is no additional in-

(25)

0 50 100 -300

-200 -100 0 100

Eig.v. No. SB #: 0

Eig.v. Magn. [dB]

0 50 100

300 200 100 0 100

Eig.v. No. SB #: 1

0 50 100

300 200 100 0 100

Eig.v. No. SB #: 2

0 50 100

-300 -200 -100 0

Eig.v. No. SB #: 3

Eig.v. Magn. [dB]

0 50 100

300 200 100 0 100

Eig.v. No. SB #: 4

0 50 100

300 200 100 0

Eig.v. No. SB #: 5

0 50 100

-400 -300 -200 -100 0

Eig.v. No. SB #: 6

Eig.v. Magn. [dB]

0 50 100

400 300 200 100 0

Eig.v. No. SB #: 7

0 50 100

400 300 200 100 0

Eig.v. No. SB #: 8 Eigenvalue plots for estimated autocorrelation matrixes in subbands

Figure 5: Eigenvalue plots of autocorrelation matrixes using subband scheme; solid line shows minimal eigenvalue possible for pseudo inverse. Number of subbands is 16 and the first 9 are shown.

formation in the last 7 subbands. The correlation matrixes are estimated for the whole speech sequence for which the evaluation is made. It shows that the ratio between the largest and the smallest singular values have been reduced. It should be noted that the singular values have been estimated over a 4 second sequence.

However, at a certain time instant the spectral content in the signal can be such that the correlation matrix estimate is singular due to the weighting, which in turn will lead to an unstable RLS algorithm implementation.

3.3 The use of energy detectors in the subbands

An energy detector (ED) can be introduced in order to stop updating the algorithm in those subbands where the excitation is poor. In this way the performance will be kept high in the fullband identification problem, by keeping the filter weights unal- tered. Since the condition number in the fullband identification problem depends on the input signal’s spectral content, which is time-varying, the energy detector will act as a time instant worst case condition limiter. The use of an energy detec- tor is therefore crucial when the input has the character of speech signals. The use of ED gives more additional identification accuracy for the LMS then for the RLS.

This comes from the fact that the RLS by itself equals the spectral ratios, as long as they do not exceed the dynamic range of the processor.

The introduction of energy detectors gives better fullband accuracy and also in- creases the convergence of the total system as shown in chapter 3.4.

(26)

3.4 The subband identification problem

Even though the main objective of the whole system is to make an accurate fullband identification of the acoustic channel, the subband identifications are also sensitive to high spectral range. Figure 6 shows the individual subband condition numbers.

It can be seen that the condition is poor mainly in the low frequency subbands.

For the speech signal evaluated, the excitation is low in this range. Therefor the subbands affected by the energy detector are the ones which have high condition numbers locally. Thus, by adapting the algorithms in those subbands where excita- tion exists, the fullband identification as well as the subband identification becomes more stable and in most cases more accurate.

1 2 3 4 5 6 7 8 9

0 50 100 150

Condition number for subband autocorrelation matrixes

#SB = 16

0 2 4 6 8 10 12 14 16 18

0 50 100 150

#SB = 32

0 5 10 15 20 25 30 35

0 50 100 150

#SB = 64

0 10 20 30 40 50 60 70

0 50 100

#SB = 128

Subband No.

Condition number Magnitude [dB]

Figure 6: Magnitude of condition numbers in Euclidian norm shown for the sub- bands.

4 Performance Evaluation

4.1 Evaluation preliminaries

All results are based on a four second sequence of true speech sampled in a real environment. Two environments are evaluated here, a hands free mobile telephone set mounted inside a car cabin, and a conference telephony set placed in a typical conference room. The algorithms are compared, by means of computational cost in the number of flops per sample of input data as well as the suppression ratio, for the fullband and the subband structures. The suppression ratios presented in the following sections are calculated as (using notation as in Figure 1)

(27)

Average Suppression [dB] = 10 N

N i=1

log|d(i)|2

|e(i)|2

where N denotes the number of samples in the sequence over the period of time where speech exists. The suppression figures presented in the appendix show the suppression during samples as short time (80 ms) power estimates.

The evaluation is performed for two structures, the number of subbands equal to 32 and 64 for the car hands free evaluation, and the number of subbands equal to 256 and 512 for the conference room evaluation. The fullband solution to the same problems are calculated for comparison. The number of FIR filter parameters for the acoustic identification have been 512 and 2048 for the car hands free and the conference telephony situation, respectively. Since the condition of the fullband identification problem is poor, as described in Chapter 3, the ordinary RLS does not converge. For comparison, the fullband least square solution is calculated. This calculation is done by omitting the singular values below a certain threshold for the pseudo inverse, as shown by Figure 4. The Wiener-Hopf equation is then solved off-line by using this pseudo inverse of the correlation matrix for the input sequence.

4.2 The hands free automobile environment

Average suppressions for the subband implementations are shown in Table 1 and 2 for 32 and 64 subbands, respectively. These results show that significant im- provement can be achieved when introducing the energy detector for the LMS algorithm, but it still does not give as good performance as the fullband solution.

The computational cost per input sample is shown in Tables 3 and 4. The gain in computational cost when using subband approach is significant for the RLS al- gorithm. For the LMS algorithm the point of break even comes at 64 subbands when compared to the fullband computational cost. The use of energy detector also gives substantial savings of computations when it comes to the RLS, as the suppression level is almost the same.

Average Suppression (dB), #SB=32 LMS RLS Improvement RLS Figure

Fullband 16.3 17.1 *) 0.8 11

Open Loop 8.6 15.2 4.6 12

Closed Loop 3.8 10.3 6.5 13

Open Loop with ED 13.9 14.7 0.8 14

Closed Loop with ED 14.1 14.4 0.3 15

Average Improvement ED (off/on) 7.8 1.8 - -

Table 1. Average suppression in decibels by using a 32 subband implementation.

∗)The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

(28)

Average Suppression (dB), #SB=64 LMS RLS Improvement RLS Figure

Fullband 16.3 17.1 *) 0.8 11

Open Loop 7.8 12.3 4.5 16

Closed Loop -6.6 11.5 18.1 17

Open Loop with ED 13.2 11.9 -1.3 18

Closed Loop with ED 12.9 11.2 -1.7 19

Average Improvement ED (off/on) 12.5 -0.35 - -

Table 2. Average suppression in decibels by using a 64 subband implementation.

∗) The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

Computational Flops/sample x1000, #SB=32 LMS RLS Difference

Fullband 2.55 1039.90 *) 1659.75

Open Loop 4.46 29.86 40.64

Closed Loop 4.19 29.61 40.67

Open Loop with ED 4.19 11.40 11.53

Closed Loop with ED 3.93 11.32 11.83

Average Improvement ED (off/on) 0.42 29.40 28.98 Table 3. Computational cost per input sample for a 32 subband implementation.

∗)The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

Computational Flops/sample x1000, #SB=64 LMS RLS Difference

Fullband 2.55 1039.90 *) 1659.75

Open Loop 2.73 9.22 10.38

Closed Loop 2.60 9.09 10.39

Open Loop with ED 2.60 4.48 3.01

Closed Loop with ED 2.49 4.44 3.15

Average Improvement ED (off/on) 0.21 7.52 7.31

Table 4. Computational cost per input sample for a 64 subband implementation.

∗)The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

Figures 7 and 8 show the relative average suppression calculated as Average Suppression (x %) [dB] = 10

N (1− x/100)

N i=Nx/100+1

log|d(i)|2

|e(i)|2 which is the average taken when leaving the first x percent of the sequence, normal- ized by the suppression achieved by the fullband LMS. This average shows the final values of the suppression after initial convergence. It can be seen that all subband implementations reach almost the same suppression in the end of sequence for the

(29)

32 subband implementation. For the 64 subband implementation the closed loop LMS system has still not converged. It should be noted that the accuracy in the suppression levels decreases as x increases since the average is taken for a shorter sequence.

0 25 50 75

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Suppression after x percent of sequence, #SB = 32

x %

Relative suppression to LMS fullband

Fullband LMS Fullband RLS*) OL.LMS.ed OL.RLS.ed CL.LMS.ed CL.RLS.ed OL.LMS OL.RLS CL.LMS CL.RLS

Figure 7: Average suppression after some initial time percentage of sequence. Num- ber of subbands is 32. Car hands free evaluation.

∗) The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

4.3 The conference telephony environment

Average suppressions for the subband implementations are shown in Tables 5 and 6 for 256 and 512 subbands, respectively. The computational cost per input sample is shown in Tables 7 and 8. It can be seen in this environment that the subband implementation is more efficient both when it comes to the suppression ratio as well as the computational cost, in the open loop case. There is a trade off between the cost and the performance when it comes to the choice of number of subbands.

In the 256 subband realization the computational cost is cut to one half while the suppression level is improved as compared with the fullband LMS realization.

Increasing the number of subbands reduces the performance and gives only slightly less computational cost. The choice of the number of subbands is crucial for the total echo cancelling performance.

The energy detector has a great impact on the suppression performance when using the LMS algorithm in the subbands. The suppression level is substantially improved for the LMS algorithm while kept the same for the RLS with the energy detector. In this environment the gain in computation cost by using the energy detector is small for both LMS and the RLS realization, since the amount of com- putations needed for the adaptation is small due to the high number of subbands.

(30)

0 25 50 75 0.6

0.4 0.2 0 0.2 0.4 0.6 0.8 1 1.2

Suppression after x percent of sequence, #SB = 64

x %

Relative suppression to LMS fullband

Fullband LMS Fullband RLS*) OL.LMS.ed OL.RLS.ed CL.LMS.ed CL.RLS.ed OL.LMS OL.RLS CL.LMS CL.RLS

Figure 8: Average suppression after some initial time percentage of sequence. Num- ber of subbands is 64. Car hands free evaluation.

∗) The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

It is interesting to notice, in the closed loop case, that the RLS has poor per- formance in this environment. This deficiency can be explained by the delay intro- duced by the filter bank on the error signal. The higher the number of subbands used, the more delay that will be introduced. Since the error signal is delayed when compared to the input signal the direct least square solution will be misled by the last samples of input data. Here, the weighting plays an important role. On one hand, the weighting should be set so that the channel tracking requirements will be met, but on the other hand the introduced delay of the error signal causes degraded performance when weighting recent information higher. This trade off is quite dependent on the acoustic situation and is difficult to resolve in practice.

Figures 9 and 10 show the average suppression relative to the fullband LMS implementation after initial convergence time, for the structures evaluated.

Average Suppression (dB), #SB=256 LMS RLS Improvement RLS Figure

Fullband 13.2 15.6 *) 2.4 20

Open Loop -2.0 16.9 18.9 21

Closed Loop -20.9 -9.2 11.7 22

Open Loop with ED 13.3 16.3 3.0 23

Closed Loop with ED 9.8 -9.2 -19 24

Average Improvement ED (off/on) 23.0 -0.3 - -

Table 5. Average suppression in decibels by using a 256 subband implementation.

∗)The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

(31)

Average Suppression (dB), #SB=512 LMS RLS Improvement RLS Figure

Fullband 13.2 15.6 *) 2.4 20

Open Loop -0.6 11.2 11.8 25

Closed Loop -29.3 1.6 30.9 26

Open Loop with ED 9.9 9.8 -0.1 27

Closed Loop with ED 6.1 1.6 -4.5 28

Average Improvement ED (off/on) 23.0 -0.7 - -

Table 6. Average suppression in decibels by using a 512 subband implementation.

∗) The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

Computational Flops/sample x1000, #SB=256 LMS RLS Difference

Fullband 9.82 16128.02 *) 16118.2

Open Loop 5.69 11.88 6.19

Closed Loop 5.56 11.75 6.19

Open Loop with ED 5.58 7.83 2.25

Closed Loop with ED 5.46 7.79 2.33

Average Improvement ED (off/on) 0.11 4.00 3.89

Table 7. Computational cost per input sample for a 256 subband implementation.

∗) The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

Computational Flops/sample x1000, #SB=512 LMS RLS Difference

Fullband 9.82 16128.02 *) 16118.2

Open Loop 4.81 6.43 1.62

Closed Loop 4.74 6.36 1.62

Open Loop with ED 4.75 5.24 0.49

Closed Loop with ED 4.69 5.22 0.53

Average Improvement ED (off/on) 0.06 1.17 1.11

Table 8. Computational cost per input sample for a 512 subband implementation.

∗) The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

5 Summary and Conclusions

A comparison between a fullband and a delayless subband adaptive acoustic echo canceller has been carried out. The acoustic echo cancelling problem can be viewed as an identification problem where the identification is made of the acoustic path.

The comparison measure has been suppression level and computational cost for the NLMS and the WRLS.

The spread of eigenvalues in the correlation matrix is a measure of how well

(32)

0 25 50 75

−2

−1.5

−1

−0.5 0 0.5 1 1.5

Suppression after x percent of sequence, #SB = 256

%

Relative suppression to LMS fullband

Fullband LMS Fullband RLS*) OL.LMS.ed OL.RLS.ed CL.LMS.ed CL.RLS.ed OL.LMS OL.RLS CL.LMS CL.RLS

Figure 9: Average suppression after some initial time percentage of sequence. Num- ber of subbands is 256. Conference telephony evaluation.

∗) The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

the problem is conditioned. For the fullband identification problem there is a high spread in eigenvalues and therefore it is an ill-conditioned problem. When trans- forming the problem to several subband identifications the condition is increased considerably. The computational savings for the WRLS is high in the subband approach. For the NLMS algorithm the savings is moderate when it comes to the number of computations.

When introducing an energy detector several benefits are encountered. The convergence rate for the NLMS algorithm is improved substantially. The compu- tational cost has been reduced further for the WRLS.

The open loop implementation, i.e. when the subband algorithm works on local error signals, the convergence rate is higher than for the closed loop case, in general.

The fullband solution when using the NLMS is still to be preferred for the problem of echo cancelling in an automobile. When dealing with echo cancelling problems such as conference telephony where the echo path is much longer in duration and therefore demands longer impulse response in the echo canceller, the implementation in subbands is shown to give better results, both when it comes to suppression performance as well as the computational load.

Overall, the difference in performance for the NLMS and the WRLS algorithms is small when implemented in subbands. This result is in favor of the NLMS algorithm because of its lower complexity. The best structure has shown to be the open loop incorporating a simple energy detector.

The advantage of utilizing a subband approach is reinforced when the acoustic path increases in length and complexity.

(33)

0 25 50 75

−2.5

−2

−1.5

−1

−0.5 0 0.5 1 1.5

Suppression after x percent of sequence, #SB = 512

%

Relative suppression to LMS fullband

Fullband LMS Fullband RLS*) OL.LMS.ed OL.RLS.ed CL.LMS.ed CL.RLS.ed OL.LMS OL.RLS CL.LMS CL.RLS

Figure 10: Average suppression after some initial time percentage of sequence.

Number of subbands is 512. Conference telephony evaluation.

∗) The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

References

[1] S. Haykin

Adaptive Filter Theory Prentice Hall, 1996 [2] B. Widrow, S. D. Stearns

Adaptive Signal Processing Prentice Hall, 1985

[3] M. M. Sondhi, W. Kellermann

”Adaptive Echo Cancellation for Speech Signals”

Advances in Speech Signal Processing, New York: Marcel Decker, 1992 , ch 11 [4] D. R. Morgan

”Slow Asymptotic Convergence of LMS Acoustic Echo Cancelers”

IEEE Trans. on Speech and Audio Processing, vol. 3, no. 2., pp. 126-136, March 1995

[5] Y. Ono, H. K iya

”Performance Analysis of Subband Adaptive Systems using an Equivalent Model”

IEEE Proc ICASSP’94(Adelade, Australia), part III, pp. 53-56, 1994

(34)

[6] D. R. Morgan, J. C. Thi

”A Delayless Subband Adaptive Filter Architecture”

IEEE Trans. on Signal Processing, vol. 43, no. 8., pp. 1819-1830, Aug 1995 [7] P.P. Vaidyanathan

Multirate Systems and Filter Banks Prentice Hall, 1993

[8] J. R. Deller, J. G. Proakis, J. H. L. Hansen Discrete-Time Processing of Speech Signals Macmillan, 1993

[9] T. S¨oderstr¨om, P. Stoica System Identification

Prentice Hall International, 1989 [10] R.M. Gray

”On the Asymptotic Eigenvalue Distribution of Toeplitz Matrices”

IEEE Trans. on Information Theory, vol. IT-16, p.p. 725-730, 1972

(35)

A Figures-Evaluation

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104 0

5 10 15 20 25 30 35 40

Suppression of Speech signal using Fullband Scheme

Magnitude Response [dB]

Samples

LMS RLS*)

Figure 11: Suppression of Speech signal using Fullband Scheme. Car hands free evaluation.

)The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

(36)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−20

−15

−10

−5 0 5 10 15 20 25 30

Suppression of Speech signal with no Energy detector, Open loop

Magnitude Response [dB]

Samples

LMS RLS

Figure 12: Suppression of Speech signal with no Energy detector, Open loop,

#SB=32. Car hands free evaluation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

−30

−20

−10 0 10 20 30

Suppression of Speech signal with no Energy detector, Closed loop

Magnitude Response [dB]

Samples

LMS RLS

Figure 13: Suppression of Speech signal with no Energy detector, Closed loop,

#SB=32. Car hands free evaluation.

(37)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−5 0 5 10 15 20 25 30

Suppression of Speech signal with Energy detector, Open loop

Magnitude Response [dB]

Samples

LMS RLS

Figure 14: Suppression of Speech signal with Energy detector, Open loop, #SB=32.

Car hands free evaluation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

−5 0 5 10 15 20 25 30

Suppression of Speech signal with Energy detector, Closed loop

Magnitude Response [dB]

Samples

LMS RLS

Figure 15: Suppression of Speech signal with Energy detector, Closed loop,

#SB=32. Car hands free evaluation.

(38)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−30

−20

−10 0 10 20 30

Suppression of Speech signal with no Energy detector, Open loop

Magnitude Response [dB]

Samples

LMS RLS

Figure 16: Suppression of Speech signal with no Energy detector, Open loop,

#SB=64. Car hands free evaluation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

−40

−30

−20

−10 0 10 20 30

Suppression of Speech signal with no Energy detector, Closed loop

Magnitude Response [dB]

Samples

LMS RLS

Figure 17: Suppression of Speech signal with no Energy detector, Closed loop,

#SB=64. Car hands free evaluation.

(39)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−10

−5 0 5 10 15 20 25 30

Suppression of Speech signal with Energy detector, Open loop

Magnitude Response [dB]

Samples

LMS RLS

Figure 18: Suppression of Speech signal with Energy detector, Open loop, #SB=64.

Car hands free evaluation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

−10

−5 0 5 10 15 20 25 30

Suppression of Speech signal with Energy detector, Closed loop

Magnitude Response [dB]

Samples

LMS RLS

Figure 19: Suppression of Speech signal with Energy detector, Closed loop,

#SB=64. Car hands free evaluation.

(40)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−5 0 5 10 15 20 25 30

Suppression of Speech signal using Fullband Scheme

Suppression Ratio Magnitude [dB]

Samples

LMS RLS*)

Figure 20: Suppression of Speech signal using Fullband Scheme. Conference tele- phony evaluation.

)The fullband RLSis calculated off-line by solving the Wiener-Hopf equations.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

−30

−20

−10 0 10 20 30 40

Suppression of Speech signal with no Energy detector, Open loop

Suppression Ratio Magnitude [dB]

Samples

LMS RLS

Figure 21: Suppression of Speech signal with no Energy detector, Open loop,

#SB=256. Conference telephony evaluation.

(41)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−40

−35

−30

−25

−20

−15

−10

−5 0 5 10

Suppression of Speech signal with no Energy detector, Closed loop

Suppression Ratio Magnitude [dB]

Samples

LMS RLS

Figure 22: Suppression of Speech signal with no Energy detector, Closed loop,

#SB=256. Conference telephony evaluation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

−5 0 5 10 15 20 25 30 35

Suppression of Speech signal with Energy detector, Open loop

Suppression Ratio Magnitude [dB]

Samples

LMS RLS

Figure 23: Suppression of Speech signal with Energy detector, Open loop,

#SB=256. Conference telephony evaluation.

(42)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−30

−20

−10 0 10 20 30

Suppression of Speech signal with Energy detector, Closed loop

Suppression Ratio Magnitude [dB]

Samples

LMS RLS

Figure 24: Suppression of Speech signal with Energy detector, Closed loop,

#SB=256. Conference telephony evaluation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

−30

−20

−10 0 10 20 30

Suppression of Speech signal with no Energy detector, Open loop

Suppression Ratio Magnitude [dB]

Samples

LMS RLS

Figure 25: Suppression of Speech signal with no Energy detector, Open loop,

#SB=512. Conference telephony evaluation.

(43)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−50

−40

−30

−20

−10 0 10 20

Suppression of Speech signal with no Energy detector, Closed loop

Suppression Ratio Magnitude [dB]

Samples

LMS RLS

Figure 26: Suppression of Speech signal with no Energy detector, Closed loop,

#SB=512. Conference telephony evaluation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

−5 0 5 10 15 20 25

Suppression of Speech signal with Energy detector, Open loop

Suppression Ratio Magnitude [dB]

Samples

LMS RLS

Figure 27: Suppression of Speech signal with Energy detector, Open loop,

#SB=512. Conference telephony evaluation.

(44)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 104

−15

−10

−5 0 5 10 15 20 25

Suppression of Speech signal with Energy detector, Closed loop

Suppression Ratio Magnitude [dB]

Samples

LMS RLS

Figure 28: Suppression of Speech signal with Energy detector, Closed loop,

#SB=512. Conference telephony evaluation.

(45)

Acoustic Echo Cancelling and Noise Suppression with

Microphone Arrays

(46)

ISSN: 1103-1581, Universityof Karlskrona/Ronneby.

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

General government or state measures to improve the attractiveness of the mining industry are vital for any value chains that might be developed around the extraction of

In this thesis we investigated the Internet and social media usage for the truck drivers and owners in Bulgaria, Romania, Turkey and Ukraine, with a special focus on

The main goals and motives behind the first FDI undertaken by Chinese companies, for example, was not to maximise profit in the sectors involved, but more to expand and enhance

• Page ii, first sentence “Akademisk avhandling f¨ or avl¨ agande av tek- nologie licentiatexamen (TeknL) inom ¨ amnesomr˚ adet teoretisk fysik.”. should be replaced by

Paper II: Derivation of internal wave drag parametrization, model simulations and the content of the paper were developed in col- laboration between the two authors with

As highlighted by Weick et al., (2005) sensemaking occurs when present ways of working are perceived to be different from the expected (i.e. Differences in regards to perceptions

Respondent A also states that if a current client makes changes in the ownership, a new credit assessment process will be initiated and if the bank does not get to know