Robust Echo-Cancellation for Simple VoIP-Applications in Embedded Systems

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Robust Echo-Cancellation for Simple

VoIP-Applications in Embedded Systems

Examensarbete utfört i Kommunikationssystem vid Tekniska högskolan i Linköping

av

Anton Eriksson

LiTH-ISY-EX--15/4886--SE

Linköping 2015

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Robust Echo-Cancellation for Simple

VoIP-Applications in Embedded Systems

Examensarbete utfört i Kommunikationssystem

vid Tekniska högskolan i Linköping

av

Anton Eriksson

LiTH-ISY-EX--15/4886--SE

Handledare: Hien Quoc Ngo

isy, Linköpings universitet Daniel Nordgren

Syntronic

Examinator: Danyo Danev

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution

Division, Department

Division of Communication Systems Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2015-10-11 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://www.commsys.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-121862 ISBN — ISRN LiTH-ISY-EX--15/4886--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title

Robust Eko-Cancellering för Simpla VoIP-Applikationer i Inbyggda System Robust Echo-Cancellation for Simple VoIP-Applications in Embedded Systems

Författare

Author

Anton Eriksson

Sammanfattning

Abstract

Voice over IP (VoIP) is the group of techniques for delivering voice communica-tions over Internet Protocol (IP) networks. It has mainly served as the possible substitution for regular PSTN over the last decades, but has recently gained an increased interest in various areas such as alarm applications and customer service. Acoustic echo is the situation were a distorted version of the sent signal is transmitted back to the sender, due to acoustic feedback between loudspeaker and microphone. There already exists several algorithms to solve this problem, and this thesis provides a study of the performance in relation to the computational complexity of the algorithms. This is in order to indicate which approaches are better suited for implementation in an embedded system, where resources are limited.

During the thesis a number of algorithms were tested, including variations of the LMS algorithm, some other approaches utilizing the correlation between echo and signal, and the RLS algorithm. They were first tested in MATLAB, on speech signals recorded at Syntronic and distorted by adding echo, then tested by implementation in C, and run on speech signals recorded in a simulated VoIP system at Syntronic. The results were then evaluated in terms of efficiency and computational complexity.

Nyckelord

(6)

(7)

Abstract

Voice over IP (VoIP) is the group of techniques for delivering voice communica-tions over Internet Protocol (IP) networks. It has mainly served as the possible substitution for regular PSTN over the last decades, but has recently gained an increased interest in various areas such as alarm applications and customer service. Acoustic echo is the situation were a distorted version of the sent signal is transmitted back to the sender, due to acoustic feedback between loudspeaker and microphone. There already exists several algorithms to solve this problem, and this thesis provides a study of the performance in relation to the computational complexity of the algorithms. This is in order to indicate which approaches are better suited for implementation in an embedded system, where resources are limited.

During the thesis a number of algorithms were tested, including variations of the LMS algorithm, some other approaches utilizing the correlation between echo and signal, and the RLS algorithm. They were first tested in MATLAB, on speech signals recorded at Syntronic and distorted by adding echo, then tested by implementation in C, and run on speech signals recorded in a simulated VoIP system at Syntronic. The results were then evaluated in terms of efficiency and computational complexity.

Sammanfattning

Voice over IP (VoIP) är en grupp av tekniker för att leverera röstkommunikation över IP-nätverk. Tekniken har främst fungerat som ett möjligt substitut för vanlig PSTN under de senaste årtiondena, men har nyligen fått ett ökat intresse inom områden som larmapplikationer och kundservice.

Akustiskt eko är situationen som uppstår då en förvrängd version av den sända signalen sänds tillbaka till avsändaren, på grund av akustisk återkoppling mellan högtalare och mikrofon. Det finns idag flera algoritmer för att lösa detta problem, och i den här rapporten ges en studie av prestanda i förhållande till beräknings-komplexitet. Detta för att kunna ange vilka metoder är bättre lämpade för ge-nomförandet i ett inbyggt system där resurserna kan vara begränsade.

Under examensarbetet testades ett antal algoritmer, inklusive variationer av LMS-algoritmen, andra metoder som utnyttjar korrelationen mellan signalen och ekot, samt RLS-algoritmen. De undersöktes till en början i MATLAB, på talsig-naler som spelades in på Syntronic och manipulerades genom tillsatt eko, för att sedan testas genom implementering i C, på talsignaler som spelats in i ett simulerat VoIP-system på Syntronic.

(8)

(9)

Acknowledgments

I would like to thank the people how has been involved in this thesis work, including examiner Danyo Danev and supervisor Hien Quoc Ngo, and the kind people at Syntronic Software Innovations: Daniel Nordgren and Andreas Forsberg. I would also direct a special thank to Professor Mikael Olofsson and Professor Fredrik Gustafsson, who even though not directly involved in the thesis have provided me with helpful guidance. A thank also goes to my opponent Joakim Valberg, who has read my report and provided me with helpful feedback.

(10)

(11)

Introduction

This document is the report for a Master thesis in Electrical Engineering. The the-sis is done on behalf of Syntronic Software Solutions in Linköping and the division of Communication Systems, Department of Electrical Engineering at Linköping University. In this chapter the background and motivation of the thesis will be laid out, together with a description of the problem, approach and outline of the report.

1.1 Background

Voice over IP (VoIP) is the group of techniques for delivering voice communica-tions over Internet Protocol (IP) networks. VoIP utilizes Audio-over-IP technol-ogy, whereby the audio streams are packeted and transmitted over packet-switched networks, which is the main difference to ordinary Public Switched Telephone Net-works (PSTN) where the digital information is instead sent over circuit-switched networks, where a continuous link is established when a call is made. VoIP systems employ session control and signalling protocols to control the signalling, set-up, and tear-down of calls. One of the most typically used protocols is Session Initia-tion Protocol (SIP).

VoIP has mainly served as the possible substitution for regular PSTN over the last decades, but has recently gained an increased interest in various areas such as alarm applications and customer service. The possibility to incorporate a fully functional VoIP client in an embedded module can provide a cheaper alternative to many short-way communication systems of today.

1.2 Problem Statement

There are today a number of different applications for IP telephony on the market. Accomplishments in the area of signal processing are vast, and echo cancelling is no longer an actual problem. There are many different algorithms that can be

(18)

2 Introduction

implemented. Which one to use in a certain scenario is however a question to be answered. The problem is to identify the demands and limitations in each application and its hardware. One solution optimal for a PC may not be suitable for a low frequency processor in an embedded system. The goal of the thesis is to provide a study of the trade-off between simplicity and performance of echo-cancellation techniques.

1.3 Scope

This thesis aims to study the performance of a number of echo cancellation algo-rithms. Since the purpose is to provide a study for implementation of an algorithm for application of VoIP in embedded systems, a choice of algorithms was made. The choice was based on the assumption that algorithms performed in the time domain are simpler and therefore less computationally demanding, so only algo-rithms of this kind are evaluated.

No actual study of the hardware and the limitations of an embedded system has been made. This was due to the lack of a suitable testing platform on site. No implementation was done in an existing VoIP application either, instead a simulation environment was established.

1.4 Approach

The approach of the thesis is to implement a number of known algorithms, in-vestigate the performance of each algorithm by altering different parameters in MATLAB simulations and ultimately test the results in a simulated VoIP envi-ronment. The ultimate goal from Syntronics point of view is to have a stable and robust echo cancelling algorithm that is both easy to implement and easily tunable.

1.5 Thesis outline

1. Introduction

2. Theoretical Background

This chapter describes the theory behind the thesis, including a descrip-tion of the acoustic echo problem and an outline of the possible soludescrip-tions in terms of available algorithms and approaches.

3. Implementation

This chapter gives a description of the approach of this thesis. A full report of how the work was performed will be given, including description of the MATLAB work with some code examples.

(19)

1.5 Thesis outline 3

In this chapter, all results from the implementation work will be outlined and described. Some plots will be given with related explanations.

5. Discussion and Conclusion

This chapter concludes the thesis and presents a discussion of the results and conclusion of the report. Some discussion about future work will also be given.

(20)

(21)

Chapter 2

Theoretical Background

As stated in chapter 1, many algorithms with the purpose of cancelling echo in IP telephony have already been implemented an analysed. In this chapter a number of known methods will be presented. Some additional information about the available hardware will also be discussed.

2.1 Acoustic Echo Cancellation

The problem of echo in communication is the phenomenon in which a delayed and distorted version of an original sound or other signal is reflected back to the source. The echo effect is desirable in music, but undesirable for speech. The problem mainly occurs as a result of the feedback between loudspeaker and microphone of a communication unit (Acoustic Echo). The reason for echo can also be electrical leakage (Line Echo), but next to the feedback problem they can be considered as negligible [1] [2]. The main difficulty when attempting to cancel out echo from a signal is the varying characteristics of the acoustic channel and the signal itself. Objects moving around in the vicinity change the channel between the microphone and the speaker, leading to a time-varying impulse response. Also speech itself is a highly non-stationary process, making it difficult to model and therefore difficult to extract from a signal. The process of Acoustic Echo Cancellation will hereafter be abbreviated AEC [2].

2.1.1 Approach

The solution to the problem of non-stationary processes and time-varying channel characteristics that many have proposed is adaptive filtering. The idea behind echo cancellation is to model the known signal sent on the near-end side and subtract that from the received signal from the far end side. This is done by continuously estimating the impulse response of the Loudspeaker Enclosure Microphone System (LEMS) [3] in the model in Figure 2.1, filter the known near-end signal with the estimated filter and subtract the result from the received far-end signal. The variation in the approach is the different algorithms that can be used for the

(22)

6 Theoretical Background

estimation of the filter. The solution must be adaptive, i.e it must be able to respond to changes in the channel and the signal itself. The performance is always a trade-off between stability and adaptation speed. The computation complexity is off course also an important factor that will be examined closer later on.

2.1.2 Model

The model of the communication system that serve as a conceptual visualization is shown in Figure 2.1. d(t) x(t) y(t) s(t) far-end near-end h n(t) +

Figure 2.1. A conceptual model of the communication system.

The input, far-end signal x(t) is sent through the channel of the LEMS, resulting in a modified version y(t) returning to the origin. The relation between x(t) and y(t) is described as

y(t) = (h ∗ x)(t), (2.1)

where ∗ denotes convolution and h the unknown impulse response of the LEMS. The signal from the loudspeaker now consists of both the near-end signal s(t) and the echo returned signal y(t):

d(t) = s(t) + y(t) + n(t). (2.2) In order to be able to estimate and digitally compute the impulse response, we must look at the discrete versions of the signals as x(n) and y(n) respectively, as well as the impulse response h. The convolution now becomes

y(n) = (h ∗ x)(n) =

n

X

k=0

h(k)x(n − k), (2.3)

where n is the discrete time instance. If we estimate this relationship with a FIR-filter with N filter coefficients given as ˜h, the output of that filter is

(23)

2.2 Algorithms 7 ˜ y(n) = N −1 X k=0 ˜ h(k)x(n − k). (2.4)

What remains now is the adaptive estimation of the filter taps ˜h(n). Some different methods for doing this are described in section 2.2.

2.1.3 Double Talk

One problem that arises when attempting to perform echo cancellation is the occurrence of Double Talk, i.e when the near-end listener starts to talk simulta-neously (s(t) 6= 0), making it hard to identify what signal is to be removed [4][5], comparable to a situation when the noise level becomes critically high. The double talk situation causes problems to some of the algorithms in Section 2.2.

2.2 Algorithms

In this section a number of conventionally used adaptive filtering algorithms suited for AEC will be presented. Since this thesis aims at discovering an algorithm suitable for embedded systems handling I/O streams consisting of 8-bit samples, at a sample rate of 8 kHz, a limited choice of algorithms has been made. The techniques described in this chapter are mostly from the 21:th century and are often based on methods discovered as early as in the mid 20:th century. The choice is then limited to algorithms performing calculations in the time domain that have mostly been presented in the last decade. This for example excludes the technique presented in [6].

2.2.1 Least Mean Squares

The Least Mean Squares (LMS) algorithms is a series of methods that is arguably the most widespread in the concept of AEC, over the last couple of decades. This is due to their simplicity, robustness and easy implementation. In this section, we will discuss some versions of the classical LMS, which is a stochastic gradient based adaptive algorithm [1][2]. Figure 2.2 illustrates the addition of filter block ˜

h which takes x(t) as input. The output of ˜h is subtracted from the returned signal, yielding an error signal which is used as feedback to the filter update. All algorithms in this section are implemented according to Figure 2.2. The output of the system is described as

˜ s(n) = d(n) − N −1 X k=0 ˜ h(k)x(n − k). (2.5)

(24)

8 Theoretical Background s(t) x(t) y(t) s(t) far-end near-end h n(t) h h + + + + -~ ~

Figure 2.2. The model including the adaptive filter block

The classical LMS

The classical LMS provides the most basic and easily implemented solution to the echo problem. If the tap vector of the FIR filter of length N at time instance n is denoted ˜hn, and the input vector xn, the algorithm attempts to minimize the

MSE loss function J :

J = E[e(n)2] = E[(d(n) − ˜hT_nxn)2]. (2.6)

The gradient vector of the loss function is given by

∇J = δJ δ˜h = E        −2        e(n)x(n) e(n)x(n − 1) e(n)x(n − 2) .. . e(n)x(n − (N − 1))               = −2E[e(n)xn]. (2.7)

Utilizing this is called the Steepest Descent Algorithm, where the coefficients are upgraded once per sample interval, and according to formula 2.8 [4][1]:

˜ hn+1= ˜hn+ µe(n)xn, e(n) = d(n) − ˜hT_nxn, xn = (x(n), x(n − 1), x(n − 2), . . . , x(n − (N − 1))) T , ˜ hn = (˜hn,0, ˜hn,1, ˜hn,2, . . . , ˜hn,N −1)T, (2.8)

where µ is the user chosen step size, which determines the convergence speed of the algorithm. The main setback is the trade-off between adaptation speed and

(25)

2.2 Algorithms 9

stability: the MSE is directly proportional to the adaptation speed, whereas the convergence time increases as the step size decreases [7]. Also when the step size exceeds a certain point, the system becomes unstable.

Normalized LMS (NMLS)

NLMS is a simple modification of the ordinary LMS, providing a more robust version not as sensitive to rapid changes in the signal. The step size in the formula is here divided by the square of the Euclidian Norm of the input signal ||xn||2 [1],

as: ˜ hn+1= ˜hn+ µ ||xn||2+ λ e(n)xn. (2.9)

The λ in the formula 2.9 is a small constant with the purpose of ensuring stability in the case of small amplitude input signals [2]. The NLMS is more robust and enables the use of longer filters as well as the choice of larger step size.

Variable Step size LMS (VSLMS)

VSLMS attempts to counter the problem of the trade-off between convergence rate and steady-state error by employing a time varying step size in the standard LMS weight update recursion. This is based on using large step-size values when the algorithm is far from the optimal solution, to increase the adaptation speed. When the algorithm is near the optimum, µ is instead decreased to minimize the MSE, thus achieving better overall performance [7][8]. The conventional VSLMS proposed by [8] updates the parameter once per sampling interval and according to the following formula:

µ(n + 1) = αµ(n) + γe(n)2. (2.10)

The parameter α provides the exponential forgetting factor of the algorithm and should be chosen in the region of [0, 1]. γ controls the adaptation of the step size and is usually very small in order to avoid instability. Typical values of the parameters that have proven to be successful in some cases are α = 0.97 and γ = 4.8 × 10−4 [8].

The VSLMS algorithm has proven to be much more efficient than the classical LMS and the NLMS. One considerable weakness is although the sensitivity to noise. When the noise level becomes to high, the squared error and hence also the step size µ increases rapidly, resulting in the system being unstable. To tackle this, an expanded version of the VSLMS algorithm is proposed by [7]. The idea behind this is to use an estimate of the autocorrelation of the error, instead of the squared error, in the step size update:

(26)

10 Theoretical Background

µ(n + 1) = αµ(n) + γp(n)2,

p(n) = βp(n − 1) + (1 − β)e(n)e(n − 1). (2.11)

The positive constant β is an exponential weighting parameter that adjusts the behaviour of the averaging estimation. In stationary processes, previous values contains information valuable for the estimation of the adaptation state, hence β should be close to 1. In non-stationary processes, β should be small in order to allow quicker adaptation to new statistics [7].

2.2.2 Correlation Least Mean Squares

A considerable problem that all of the proposed LMS algorithms encounter is the situation of Double Talk previously discussed under Section 2.1.3. One quick and easy solution is to introduce speech recognition in the transmitter and freeze the filter tap weights in presence of double talk. However, temporarily stopping the tap adaptation is just a passive action for handling double-talk and causes a decrease of adaptation speed and or totally inaccurate adaptation when the echo path is changed in this period.

A new algorithm is proposed by [3] that minimizes the error between the cor-relations of x(n) and d(n) instead the actual signal error. The idea is based on the assumption that x(n) and s(n) are uncorrelated.

The echo returned signal y(n) is before described as the following, assuming impulse response length N :

y(n) = (h ∗ x)(n) =

N −1

X

k=0

h(k)x(n − k), (2.12)

where h(n) is the impulse response of the LEMS. The correlation φdx(n) between

signals d(n) and x(n) and the auto-correlation φxx(n, k) of x(n) are given by

φdx(n) = n X j=0 d(j)x(j), (2.13) φxx(n, k) = n X j=0 x(j)x(j − k), x(j − k) = 0, j < k. (2.14)

Since d(n) consists of both y(n) and s(n), and y(n) is described as 2.12, φdx(n)

(27)

2.2 Algorithms 11 φdx(n) = φsx(n) + n X j=0 y(j)x(j) = φsx(n) + n X j=0 x(j) N −1 X k=0 h(k)x(n − k) = φsx(n) + N −1 X j=0 h(j)φxx(n, j) = N −1 X j=0 h(j)φxx(n, j), (2.15)

where the last step is done by using the assumption that φsx(n) = 0. So, in the

same manner as in LMS, an adaptive filter with coefficients ˜h and the input

far-end signal autocorrelation function φxx(n, j) as input vector is defined, and the

adaptation is done by minimizing the MSE, again denoted as the cost function J : J = E[(φdx(n) − ˜φdx(n))2]. (2.16)

The formula for the adaptation algorithm then becomes

hn+1= hn+ 2µ 1 + ||φxx(n)||2 e(n)φxx(n), ˜ hn = (˜hn,0, ˜hn,1, ˜hn,2, . . . , ˜hn,N −1)T, φxx(n) = (φxx(n, 0), φxx(n, 1), φxx(n, 2), . . . , φxx(n, (N − 1)))T, e(n) = φdx(n) − ˜φdx(n). (2.17)

The method is thus very similar to NLMS, except the processing is performed in the correlation paths. The correlation functions are estimated by the following recursions:

φxx(n, i) = (1 − α)φxx(n − 1, i) + αx(n)x(n − i),

φdx(n) = (1 − β)φdx(n − 1) + βd(n)x(n), (2.18)

where α and β are positive and smaller than 1. The filter is then implemented on the far-end input signal as Equation 2.5. A conceptual picture is given in Figure 2.3 [3][5].

The Expanded CLMS (ECLMS)

ECLMS is an enhanced version of CLMS proposed by [5], designed to achieve better convergence than the original CLMS. The main difference is the cost func-tion J , which is derived from a new definifunc-tion of the correlafunc-tion funcfunc-tion between output signal d(n) and input signal x(n):

φdx(n, k) = n

X

j=0

(28)

12 Theoretical Background s(n) x(n) y(n) s(n) far-end near-end 1-β + + + + ~ Z-1 β + + Z-1 α 1-α h ϕ (n) ϕ (n) xx dx dx ϕ (n) h ~ +-₊ -+ ~

Figure 2.3. Echo Cancelling using the CLMS algorithm.

The relation between the input auto-correlation and φxx(n, k) is then given by:

φdx(n, k) = N −1

X

i=0

hi(n)φxx(n, k − i). (2.20)

An estimation of the impulse response coefficients hn is done similarly as before.

The cost function is defined as

J = E[eT(n)Re(n)], (2.21)

where the diagonal matrix R contains weight factors ri and error vector e(n) is

given as R =      r0 0 · · · 0 0 r1 · · · 0 .. . ... . .. ... 0 0 · · · rN −1      , (2.22)

e(n) = [e(n, 0), e(n, 1), . . . , e(n, N − 1)]T,

e(n, k) = φdx(n, k) − ˜φdx(n, k). (2.23)

The gradient is then given as

∇J = δJ δ˜h = E " 2 _δ δ˜he(n) T Re(n) # . (2.24)

The differentiation in the equation is given by δ

δ˜he(n) =

δ

(29)

2.2 Algorithms 13

where φdx(n) and Ψxxare described by

φdx(n) = [φdx(n, 0), φdx(n, 1), φdx(n, ), . . . , φdx(n, (N − 1)]T, (2.26) Ψxx(n) =      φxx(n, 0) φxx(n, 1) · · · φxx(n, N − 1) φxx(n, −1) φxx(n, 0) · · · φxx(n, N − 2) .. . ... . .. ... φxx(n, −N + 1 φxx(n, −N + 2) · · · φxx(n, 0)      . (2.27)

By assuming φxx(n, k) = φxx(n, −k) and substituting equation 2.25 into Equation

2.24, the gradient becomes

∇J = −2E [Ψxx(n)Re(n)] . (2.28)

All in all, this yields the formula (2.2.2) for the ECLMS algorithm.

˜

hn+1= ˜hn+

2µ

1 + tr [Ψxx(n)RΨxx(n)]

Ψxx(n)Re(n). (2.29)

Here, tr[.] denotes the trace function. Also, it is worth mentioning that by setting ro = 1 and all other weight factors to zero, the ordinary CLMS algorithm is

obtained [5].

The correlation algorithms have proven to be excellent successors to the LMS familiy, in the case of present Double Talk, where the ECLMS shows better vergence than CLMS [3]. It is however quite obvious that the complexity is con-siderably increased, especially with the proposed ECLMS.

2.2.3 Recursive Least Squares (RLS)

Although widely considered to be far to complex for real-time implementation [9], the Recursive Least Squares (RLS) algorithm is proposed in [1] and [10] as the most proper algorithm for echo cancelling due to its quick convergence and low MSE.

The idea behind the RLS algorithm is the minimization of the cost function given as Vn(˜h) = n X k=1 λn−k(d(k) − ˜hTxk))2, 0 < λ ≤ 1. (2.30)

This is the sum of all squared errors during a run, weighted by the parameter λ, which serves as a forgetting factor. Old measurements are thus discarded expo-nentially.

Vn(˜hn) is quadratic and is minimized by the solution to the linear system of

(30)

14 Theoretical Background fn= Rn˜hn⇔ ˜hn= R−1n fn, (2.31) where Rn= n X k=1 λn−kxkxTk, (2.32) fn= n X k=1 λn−kxky(k). (2.33)

This follows from the fact that Rn is positive definite. The parameters Rn and

fn can be calculated recursively as

Rn= λRn−1+ xnxTn

fn= λfn−1+ xny(n) (2.34)

Substituting R_n−1 by Pn, this leads to the tap weight update formula:

˜ hn+1= ˜hn+ Pnxne(n) e(n) = d(k) − ˜hTxn Pn+1= 1 λ Pn− PnxnxTnPn λ + xT nPnxn (2.35)

The forgetting factor λ can also be set as time-variant λ(n) for greater flexibility. If not, common values for λ are [0.98 0.99], i.e. close to 1.

The main advantage with RLS over LMS is the greater step-size in the transient period, together with a steeper direction. The RLS hence normally converges much faster. [11, pp. 350-352]

2.3 Hardware

The platform on which the different algorithms are planned to be implemented on is a Raspberry Pi. It is a credit-card sized computer developed by the Raspberry Pi Foundation, with the purpose of serving as a tool for teaching in computer science in schools [12].

(31)

Chapter 3

Implementation

In this chapter the implementation and evaluation of the different algorithms de-scribed in Chapter 2 will be dede-scribed. Some results from both the MATLAB simulations and the real time VoIP implementation will also be presented.

3.1 MATLAB

To to provide a first idea about how the algorithms work and how well they per-form, MATLAB functions were written to simulate each one of them. The func-tions (one per algorithm) take in two input vectors as arguments, representing the signals x(n) and d(n) respectively, along with some additional tuning parameters such as filter order and step length and return an output vector representing sig-nal ˜s(n) and the filter coefficients used in the last iteration, see Figure 3.1. The functions all follow the conceptual procedure of the following pseudo code:

f u n c t i o n [ s t i l d e , Theta ] = myAlgorithm ( d , x , o r d e r , s t e p , a d d i t i o n a l arguments . . ) Theta = z e r o s ( o r d e r , 1 ) ; s t i l d e = z e r o s (N , 1 ) ; f o r n = 1 t o ( l e n g t h o f s i g n a l s ) %G e n e r a t e r e a l t i m e o u t p u t b a s e d on Theta y t i l d e = Theta ’ ∗ p h i ; s t i l d e ( n ) = d ( n ) − y t i l d e ;

%Update t o Theta u s i n g t h e update f o r m u l a %s p e c i f i c f o r myAlgorithm

Theta = myAlgorithm_update ; end

(32)

16 Implementation

The test data used for testing was recorded in MATLAB using the build-in functions audiorecorder, recordblocking and getaudiodata. These MATLAB com-mands were used to record several speech signals of varying length. All signals were recorded with 8 bit precision and 8 kHz sampling frequency, which is well enough to transmit speech of reasonable quality.

MATLAB function Input vector x

Input vector d Output vector s

˷ Parameters: α, β, γ, μ, etc.

Figure 3.1. Block diagram for visualizing MATLAB implementation

3.1.1 Performance

To evaluate the different algorithms, some different performance parameters are considered. In MATLAB, the first evaluation is done by simply listening to the result of the different tests using the sound function. This provides a fast im-pression of the overall performance, but is off course not sufficient for a complete performance comparison. This is off course also the most simple way to verify the cancellation is working in the actual VoIP system.

One measurement commonly used for AEC is the Echo Return Loss Enhance-ment (ERLE) which is the added attenuation to the echoed signal, measured in dB. If we define an error signal e(n) as the difference between signals y(n) and ˜

y(n) (or the difference between signals d(n) and ˜s(n), provided no double talk is present), ERLE is defined as the ratio between the power of signal d(n) and error signal e(n), as

ERLE = 10log₁₀ E[d(n)] 2 E[e(n)]2

. (3.1)

This yields a measurement of how much of the returned echo that is actually cancelled, averaged over a number of iterations.

One other performance measurement that might be the most important, since the echo return channel is unknown and time-varying, is the convergence speed, which shows how quickly the echo canceller adapts to new conditions. This is of high importance since if the conditions of the channel on which the returned echo experiences change, the echo canceller will not be able to adapt quickly enough,

(33)

3.1 MATLAB 17

which results in new echoes. One measurement of the convergence rate is the following formula proposed by [3]:

D(n) = 10log₁₀ "N −1 X i=0 |hi− ˜hn,i|2 N −1 X i=0 |hi|2 # . (3.2)

This is hence the ratio between the squared error between estimated filter coeffi-cients and the actual impulse response, and the squared actual impulse response, measured in dB. If D(n) is plotted over the time of a simulation, it visualizes the convergence speed of the algorithm, i.e., the time it takes for the algorithm to fit the estimated filter coefficients to the impulse response of the channel. This is intuitively only possible when the impulse response is known and hence also only possible in simulations. Another helpful method which has been used in this thesis is to plot the filter coefficients that corresponds to the delay of the echo, or the filter taps that are supposed to be non-zero. These are the model parameters that are supposed to converge and to plot them yields a visualization similar to the one given by the expression given in (3.2).

3.1.2 Complexity

In order to provide a study of the trade-off between complexity and performance of echo cancellation techniques, the complexity of each algorithm will be evaluated by counting the number of multiplications and additions performed in each iteration, that is, for each sample. This reflects the number of operations that is executed at each time instance, and enables a comparison to the actual capacities of a system. In Table 3.1 the number of necessary operations for the algorithms given in section 2.2 is given, where N denotes the filter length.

Table 3.1. Complexity of the algorithms

Algorithm Operations LMS O(N ) NLMS O(N ) VSLMS O(N ) CLMS O(N ) ECLMS O(N2₎ RLS O(N2₎

(34)

The complexity off course differs between the different versions of the LMS as well, but it is only a matter of a couple multiplications and additions per sample interval. The table only serves to show that the LMS family increases linearly with the filter size, whereas the ECLMS and RLS increases exponentially.

3.1.3 Simulations

Delay

The simplest test for the echo cancellers is to examine the performance when the returned signal d(t) is a delayed and attenuated version of the original signal x(t). This means the impulse response of the channel is a tap at the position corresponding the delay given in samples, as:

h(dsfs) = a. (3.3)

Here, ds is the delay in seconds, fs is the sample frequency, and a is positive

number smaller than 1, that represents the attenuation of the returned signal. The first test is then done by recording a speech signal of length 20 seconds, as described in the beginning of Section 3.1. The signal is then delayed by 800 samples, which corresponds to 100 ms with a sample frequency of 8 kHz. Both the delayed signal and the original signal are fed as inputs to the functions, together with other arguments (filter order, step size, correlation constants etc.). This test assumes a situation where the returned echo signal is an exact copy of the original sent signal, which seldom is the case, but it is useful for providing a first impression of the different techniques in terms of convergence time.

Noise

One factor of high importance is the algorithms ability to handle noise disturbance. Since the filter updates are done primarily with respect to the error between the estimated return signal and the actual return signal, it is reasonable to assume that a high level of noise disturbance on the return signal would have a critical impact on the echo cancellation. To test this, the same configuration is used as before, with the addition of adding noise of varying levels to the delayed signal d(n). By comparing the result of this simulation with the previous one, the impact of noise can be evaluated.

Double Talk

As described in section 2.1.3 the double talk situation can cause problems to the echo cancellation. This is very similar to the case of high noise disturbance, since the error between the signals will be misleading also in this case. The simulation is done by recording an additional speech signal s(n) of the same length as x(n) and add it to d(n). The impact can be evaluated by varying the amount of disturbance signal being added, i.e. varying the amplification constant a in

(35)

3.2 Implementation in C 19

Random Impulse Response

By filtering the sent signal by a randomly generated impulse response it can be studied how the algorithms would adapt to various channels. The impulse re-sponses were generated by setting 10 randomly selected filter taps to a random number in the range [0,1].

3.2 Implementation in C

The real implementation was done in C on a Ubuntu machine to ensure the soft-ware being easily ported to the Raspberry Pi unit. It was however decided rela-tively early during the thesis that the use of the Raspberry Pi was to be excluded as a test unit, mostly because of the units lack of audio I/O, which would mean a lot of extra work being necessary only to configure the Raspberry Pi for testing.

3.2.1 Structure

The code was structured in C functions representing each of the algorithm, taking in one sample from each signal at a time and calculating the output sample. The functions are hence called once per time instance which constitutes the main difference to the already discussed MATLAB functions. The functions all require a buffer to store the signal x(n) in, in addition to all the other various parameters needed for execution, such as µ, α etc, and all parameters except for the signals themselves are passed by reference. See Figure 3.2.

C function

Input sample x

Input sample d Filter output sample y˷ Parameters: α, β, γ, μ, etc. Output sample s˷ Input sample d

+

-Buffer Filter coefficients h

Figure 3.2. Block diagram for visualizing Ubuntu implementation.

3.2.2 Collection of test data

In order to supply the project with test data, a speech signal was sent between two computers situated in different rooms using the Unix network service netcat

(36)

(command: nc). The voice was recorded in one room using a headset, and in the other room an other headset was place with the microphone in front of the speaker, thus creating the wanted feedback channel. The speech signal was recorded and sent as raw data and transmitted as 8 bit unsigned integers.

Room 1 Room 2

PC PC

x(t) d(t)

Figure 3.3. Set-up for collecting test data

3.2.3 Testing different resolutions

Since the thesis aims to discover the implementation of echo cancelling in embed-ded systems, it is suitable to study the impact of altering the resolution of the input to the algorithms. Since the netcat service only supports 8 bit unsigned integers as data, it is off course interesting to test if the algorithms can handle cal-culations using 8 bit resolution, i.e only calcal-culations with 8 bit integers. Therefore, two versions of all functions were written, one version performing the calculations using 32 bit floating points, and one version exclusively handling 8 bit integers (uint8). When the calculations is to be done with floats, the input data is simply scaled down from range [-128,127] (origninally [0,255], since the signals are sent as unsigned integers) to range [-1,1] as floats. An example of the implementation in uint8 is found in Appendix A.

(37)

Chapter 4

Results

In this chapter the results from all MATLAB simulations will be presented, to-gether with the results from the experiments on the Ubuntu system.

4.1 MATLAB results

During the thesis a vast amount of plots were generated. At least one per algorithm and scenario (often more) was produced during the experiments. In this section however, a choice has been made, presenting the plots of most significance. In all simulations except the ones for the more mathematically complex algorithms, the filter order was set to 1024.

4.1.1 LMS

The LMS and the NLMS algorithms proved, in spite of their simplicity, to be rather effective. Both in the case of double talk and in the situation with high noise level being added, the algorithms performed quite well. Figure 4.1 shows the result of echo cancelling using the LMS algorithm in the case of simple delay, i.e the signal d(n) is an undistorted and delayed version of the signal x(n). The step size was set to 0.001.

Figure 4.2 shows the result in terms of convergence function and the estimated impulse response when using LMS in the situation of having a randomly generated impulse response, and the step size is as before set to 0.001. Even in this case the LMS (and the NLMS) appears to be robust and effective algorithms.

4.1.2 VSLMS

The VSLMS algorithm shows little improvement to the ordinary LMS. If the max-imum step size is set to a point were the ordinary LMS would be unstable, the VSLMS also becomes unstable. The case when the VSLMS supposedly would fluctuate less and hence decrease the steady state error did not occur in the sim-ulations. The performance of the VSLMS is shown in Figure 4.3. The maximum

(38)

22 Results Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 D(n) [dB] -8 -6 -4 -2 0 2 [n] 0 200 400 600 800 1000 Theta[n] -0.5 0 0.5 1

Figure 4.1. Echo cancelling using the LMS algorithm. Figure shows the input signal

x(n) and the result ˜s(n) of the filtering on the first row. The second row displays the

convergence measurement function D(n) and the estimated impulse response ˜h.

Time[s] 0 5 10 15 20 D(n)[dB] -10 -8 -6 -4 -2 0 2 [n] 0 200 400 600 800 1000 Theta[n] -0.5 0 0.5 1

Figure 4.2. Figure of convergence measurement function D(n) and the estimated

(39)

4.1 MATLAB results 23 Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 D(n)[dB] -8 -6 -4 -2 0 2 [n] 0 200 400 600 800 1000 Theta[n] -0.5 0 0.5 1

Figure 4.3. Performance of the VSLMS algorithm in the presence of noise. Parameters: stepmax: 0.003, stepmin: 0.0001, α : 0.99, γ : 0.001. Noise variance: 0.05.

step size is set to 0.003 and the noise variance was 0.05. This indicates that the algorithm is in fact robust also in the case of high noise levels. The results were similar to that of the expanded version.

4.1.3 CLMS

The CLMS algorithm is as described in Section 2.2.2, not sensitive to double talk condition. Figure 4.4 shows the result of the simulation where double talk was present. The values were set to 0.003, 0.99, 0.99 for parameters step size, alpha and beta respectively.

However, performing the filter updates in the correlation paths appears in this simulation to be rather inefficient in comparison with the direct way of the ordinary LMS. The efficiency in this case is mainly measured by convergence rate, and by looking at the result from the same scenario using the LMS algorithm in Figure 4.5, it is clear that the LMS algorithm shows a better performance in this case.

4.1.4 ECLMS

The ECLMS algorithm, being the improved version of the ordinary CLMS, shows excellent performance. In the simplest case of the returned signal being a delayed version of the sent signal, the echo is cancelled out completely after a few seconds, as seen in Figure 4.6. The same goes for the case of randomly generated impulse response, as seen in Figure 4.7. Not very surprisingly, the algorithm also stands out under the double talk condition, which is shown in Figure 4.8. The performance in the situation of noise the results were similar. It should be mentioned that even though the filter order was drastically reduced to 64, the time needed for the simulation using a 20 second long speech signal was over 26 seconds. This indicates that the algorithm may not be suitable for real time implementation.

(40)

24 Results Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 D(n) [dB] -10 -5 0 [n] 0 200 400 600 800 1000 Theta[n] -0.5 0 0.5 1

Figure 4.4. Performance of the CLMS in the presence of double talk. Parameters: step size: 0.003, α : 0.99, β : 0.99. Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 D(n)[dB] -10 -5 0 [n] 0 200 400 600 800 1000 Theta[n] -0.5 0 0.5 1

Figure 4.5. Performance of the LMS in the presence of double talk. Step size was set to 0.003.

(41)

4.2 Results from Ubuntu implementation 25 Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 D(n) [dB] -20 -15 -10 -5 0 [n] 0 10 20 30 40 50 60 Theta[n] -0.5 0 0.5 1

Figure 4.6. Performance of the ECLMS. Parameters: Step size: 0.1, α: 0.9, β: 0.9.

However, MATLAB is a interpreting language meaning the computation speed is much lower than of a compiling language like C.

4.1.5 RLS

The RLS algorithm performs perfectly in the simulations without disturbance, even better than the ECLMS. Figure 4.9 shows the performance of the algorithm in the situation of random impulse response, where the echo signal is cancelled out completely. In the presence of noise and double talk however, the function D(n) indicates a rather unstable behaviour. The resulting output signal however appears to be free from echo when listened to, although some temporary peaks are visible in the resulting output signal in Figure 4.10. This could indicate a sensitivity to noise and it must be considered that all these simulations are performed using ideal test data, the case of having a perfectly delayed and undistorted echo signal is seldom the case.

4.2 Results from Ubuntu implementation

The results from the Ubuntu implementation differed from the results from the MATLAB simulations. The main reason for this was probably due to the effect of using real test data; the returned signal is now a distorted version of the sent signal, which has a significant effect on the performance. Because of the results of VSLMS and EVSLMS, and the fact that the two show great similarities to the ordinary LMS algorithm, this section focuses on the results from the remaining four algorithms.

(42)

26 Results Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 D(n)[dB] -5 0 5 [n] 0 10 20 30 40 50 60 Theta[n] -0.5 0 0.5 1

Figure 4.7. Performance of the ECLMS in the situation of randomly generated impulse response. Parameters: Step size: 0.1, α: 0.9, β: 0.9.

Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time [s] 0 5 10 15 20 D(n)[dB] -10 -5 0 [n] 0 10 20 30 40 50 60 Theta[n] -0.5 0 0.5 1

Figure 4.8. Performance of the ECLMS under condition of double talk. Parameters: Step size: 0.1, α: 0.9, β: 0.9.

(43)

4.2 Results from Ubuntu implementation 27 Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 D(n)[dB] -10 -5 0 [n] 0 10 20 30 40 50 60 Theta[n] -0.5 0 0.5 1

Figure 4.9. Performance of the RLS in the situation of randomly generated impulse response. Parameters: λ: 0.99. Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 D(n)[dB] -10 -5 0 [n] 0 10 20 30 40 50 60 Theta[n] -0.5 0 0.5 1

Figure 4.10. Performance of the RLS under condition of double talk. Parameters: λ: 0.99.

(44)

28 Results Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 Amplitude -2 -1 0 1 2 Time[s] 0 5 10 15 20 D(n)[dB] -8 -6 -4 -2 0 2 [n] 0 10 20 30 40 50 60 Theta[n] -0.5 0 0.5 1

Figure 4.11. Performance of the RLS under condition of noise. Parameters: λ: 0.99.

4.2.1 Delay

The system set-up explaned under section 3.2.1 produced test data with extremely high delay caused by the network and probably also the netcat service. The returned signal was delayed with more than a second (over 8000 samples), which meant that the algorithms would have to have extremely long filters to adapt to the long impulse response. This is practically impossible, so the delay was instead estimated by maximizing the correlation function according to Expression 4.1, where nd is delay and C(n) corresponds to the correlation function. N is the

signal length. The offset for the signal is then set to half the delay.

nd= argmaxndC(nd) = argmaxnd

N −1

X 0

d(n)x(n − nb). (4.1)

4.2.2 Performing calculations with floating points

Performing the calculations (filter update, producing of next sample etc.) using floating 32 bit point arithmetic approximately corresponds to performing the sim-ulations in MATLAB, only MATLAB instead uses double point precision (64 bit), but the results should be similar. Figure 4.12 shows the performance of the LMS algorithm.

The black signal represents the sent signal, the blue signal represents the echoed signal and the red signal is the error signal, or residual signal. In other words, the result. It is clear that the echo signal decays and is in the end practically non existing. The filter order and the step size was set to 256 and 1/96 respectively, and the signals were speech signals of approximately 21 seconds length. The ERLE in this simulation was calculated to 11.2033 dB.

(45)

4.2 Results from Ubuntu implementation 29 Time [s] 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 Sent signal Echo Output

Figure 4.12. Ubuntu simulation: LMS.

Time [s] 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 Sent signal Echo Output

(46)

30 Results Time [s] 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 Sent signal Echo Output

Figure 4.14. Ubuntu simulation: ECLMS.

Figure 4.15. Ubuntu simulation: LMS, ERLE = 20.3596 dB.

Figure 4.13 shows the same simulation but using the CMLS algorithm. The result is in this case an achieved ERLE of 9.5948, which is the highest ERLE achieved with the CLMS algorithm. The order and step size was in this case set to 256 and 1/48 respectively.

The result of filtering with the ECLMS algorithm can be seen in 4.14, showing an expected fast convergence rate. The complexity of the algorithm however has its disadvantage: the time required for processing a 20 second long speech signal was still to high when using higher filter orders. Figure 4.14 shows the results of a simulation using filter order 48 and step size 1/32. The resulting ERLE was 20.9283 dB.

This can be compared to the result of the LMS algorithm (Figure 4.15 using the same parameters (order 48, step-size 1/32), where the ERLE reaches 20.

The RLS algorithm proved to be numerically unstable in these simulations, probably due to the matrix R (see Section 2.2.3) being close to non-invertible.

(47)

4.2 Results from Ubuntu implementation 31 Time [s] 0 2 4 6 8 10 12 14 16 18 20 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Sent signal Echo Output

Figure 4.16. Theoretical results from the RLS algorithm.

Figure 4.17. Ubuntu simulation (int8): LMS.

Figure 4.16 shows a theoretical result using the RLS algorithm in MATLAB, using the build-in MATLAB invert function on the matrix, which probably compensates a bit for singularity. The simulation was performed on the same speech signals as the other Ubuntu simulations.

4.2.3 Performing calculations using 8 bit integers

Performing the calculations using only 8 bit integers proved to be rather a chal-lenge. The only two algorithms (except from maybe the VSLMS versions) that were successfully implemented with this resolution was the LMS and the CLMS. Figure 4.17 shows the results from the simulation using the LMS algorithm, which proved to be more suitable for lower resolution. The ERLE in this particular simu-lation was 13.9723 dB, whereas the best result achieved using the CLMS algorithm was 2.8052 dB.

(48)

(49)

Chapter 5

Discussion and Conclusion

In this chapter a brief summary of the results will be given and the outcome of the study will be discussed and the conclusion of the thesis will be given.

5.1 Discussion

When comparing the theoretical results from the MATLAB simulations with the ones from the Ubuntu simulations performed on real test data it is obvious that the algorithms with excellent performance in the theoretical cases not necessarily perform as well in the real case. This is probably due to the information loss when sending the speech signal over a channel with realistic impulse response. The most significant observation that can be made from the results concerning this is that the algorithms with high complexity and with excellent results from the theoretical experiments, do not live up to expectations in the later experiments. The fact that the RLS was unstable when handling "real" test data could have been studied further, there are ways of handling matrix singularity that could have been explored. However considering the fact that best result in form of ERLE achieved with the ECLMS algorithm only exceeded the ERLE of the LMS with around half a decibel (20.93 dB compared to 20.36 dB) the gain of higher complexity seems little. The calculations of the ECLMS were extremely costly and not likely to be implemented on an embedded platform. It should however be mentioned that using a filter order as low as 48 could have the effect of reducing the significance of the choice of algorithm, since the signal was then only delayed 24 samples, which corresponds to 3 milliseconds which is not even audible, and subtracting the unfiltered signal directly would probably suppress the echo.

When examining the implementation of the different algorithms using lower resolution it was clear that the higher complexity algorithms were very hard to implement, if not to say impossible. The only really successful implementation was the one of the ordinary LMS algorithm, reaching an ERLE of 13.97 dB, compared to the ERLE of the CLMS simulation only reaching 2.81 dB. The credibility of these simulations can however be questioned in the same manner as before.

(50)

34 Discussion and Conclusion

5.1.1 Conclusion

All these simulations point at the conclusion that the simple approach may be the best in this case. The simple LMS algorithm seems to be the best choice when looking at its stability, easy implementation and calculation speed. The draw-backs are off course the performance and convergence speed, which are greatly improved when using some of the other techniques, but considering that the filter order was greatly limited with higher complexity, the ability to model the impulse response is restricted. The impulse response of a room is typically a couple of hundred milliseconds, which demands a much higher filer order. This would ac-cording to the experiments only be possible when using the simpler algorithms. The RLS and ECLMS may deliver better results, but may be better suited for FPGA implementations.

5.1.2 Sources of errors

There are some factors that may have affected the outcome of this thesis, that need to be discussed. First of all was the choice of algorithms. The choice was based on the hypothesis that calculations done in the time domain would be better suited for embedded systems because of the simplicity. This has the disadvantage of having a set of techniques that are similar to some extent. As it turns out, some of the algorithms were also extremely costly. Having examined some approaches utilizing short time Fast Fourier Transform would certainly have improved the quality of the thesis.

One other source of error is the fact that no real VoIP system was available during the thesis, so no test data produced by a system using audio coding was at hand. The test data was produced by sending raw data. Lossy audio coding could off course add additional distortion to the signal, and this would probably also have added new challenges for the echo cancelling algorithms.

5.2 Future work

Because of the theoretically better results from the more complex methods, it should be examined how these could be implemented in a more efficient way. One idea might be multi-threading, and for example letting the update of the filter be done in a separate thread, performing the filter update not for each sample, but maybe every 16:th sample. This would substantially reduce the calculation complexity, and it should be examined how this affects the performance.

One other technique that should be examined further is the use of post filtering, i.e. filtering the output of the AEC filter to suppress the residuals. One way of doing this could be the use of a Wiener filter, as suggested by [2].

(51)

Bibliography

[1] I. Homănă, M. D. Topa, and B. S. Kirei, “Echo cancelling using adaptive algorithms,” SIITME 2009 - 15th International Symposium for Design and Technology of Electronics Packages, no. 15, pp. 317–321, 2009.

[2] F. Hallack and M. Petraglia, “Performance comparison of adaptive algorithms applied to acoustic echo cancelling,” 2003 IEEE International Symposium on Industrial Electronics ( Cat. No.03TH8692), vol. 2, pp. 1147–1150, 2003. [3] M. Asharif, T. Hayashi, and K. Yamashita, “Correlation LMS algorithm and

its application to double-talk echo cancelling,” Electronics Letters, vol. 35, no. 3, pp. 194–195, 1999.

[4] F. Gustafsson, Adaptive Algorithms and Change Detection. John Wiley & Sons Ltd, 2000.

[5] M. Asharif, A. Shimabukuro, T. Hayashi, and K. Yamashita, “Expanded CLMS algorithm for double-talk echo cancelling,” IEEE SMC’99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cy-bernetics (Cat. No.99CH37028), vol. 1, no. 1, pp. 998–1002, 1999.

[6] M. Fukui, S. Shimauchi, Y. Hioka, A. Nakagawa, and Y. Haneda, “Double-talk Robust Acoustic Echo Cancellation for CD-quality Hands-free Videocon-ferencing System,” vol. 60, no. 3, pp. 468–475, 2014.

[7] T. Aboulnasr and K. Mayyas, “A robust variable step-Size lms-type algo-rithm: Analysis and simulations,” IEEE Transactions on Signal Processing, vol. 45, no. 3, pp. 631–639, 1997.

[8] R. H. Kwong and E. W. Johnston, “A variable step size LMS algorithm,” IEEE Transactions on Signal Processing, vol. 40, no. 7, pp. 1633–1642, 1992. [9] A. Deb, A. Kar, and M. Chandra, “A Technical Review on Adaptive Algo-rithms for Acoustic Echo Cancellation,” Communications and Signal Process-ing (ICCSP), 2014 International Conference on, pp. 41–45, 2014.

[10] I. Homana, M. Topa, B. S. Kirei, and C. Contan, “Adaptive algorithms for double-talk echo cancelling,” 2010 9th International Symposium on Electron-ics and Telecommunications, ISETC’10 - Conference Proceedings, no. 15, pp. 349–352, 2010.

(52)

36 Bibliography

[11] L. L. Fredrik Gustafsson, Mille Millnert, Signal Processing. Studentlitteratur AB, Lund, 2010.

[12] Wikipedia, “Raspberry pi— wikipedia, the free encyclopedia,” 2015. [Online; accessed 22-April-2015].

(53)

Appendix A

Code Example

A.1 C implementation of the LMS algorithm

//−−−−− L e a s t Mean S q u a r e s 8 b i t i n t −−−−−−−−−−−−−−−−− i n t 8 _ t LMS_int8 ( i n t 8 _ t d , i n t 8 _ t x , i n t 8 _ t ∗ t h e t a , i n t 8 _ t ∗ phi , u i n t 8 _ t ∗ pos , c o n s t u i n t 8 _ t ∗ o r d e r , c o n s t i n t 8 _ t ∗mu, c o n s t i n t 8 _ t ∗ one ) { u i n t 8 _ t i ; // F u n c t i o n c o u n t e r i n t 8 _ t e r r o r = 0 ; // E r r o r i n t 8 _ t y t i l d e = 0 ; // O u t s i g n a l ∗ pos = ( ∗ pos + 1 ) % ∗ o r d e r ; p h i [ ∗ pos ] = x ; // S t o r e f a r −end s i g n a l i n p h i // Produce o u t p u t f o r ( i = 0 ; i < ∗ o r d e r ; i ++){ y t i l d e = y t i l d e + ( t h e t a [ i ] ∗ p h i [ ( ∗ o r d e r + ∗ pos − i ) % ∗ o r d e r ] ) / ∗ one ; } // Produce o u t p u t // Update t o f i l t e r e r r o r = d − y t i l d e ; f o r ( i = 0 ; i < ∗ o r d e r ; i ++){ t h e t a [ i ] = t h e t a [ i ] + (mu [ 0 ] ∗ e r r o r ∗

p h i [ ( ∗ o r d e r + ∗ pos − i ) % ∗ o r d e r ] ) /mu [ 1 ] / ∗ one ; }// Update t o f i l t e r

r e t u r n y t i l d e ; }

Robust Echo-Cancellation for Simple VoIP-Applications in Embedded Systems

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Robust Echo-Cancellation for Simple

VoIP-Applications in Embedded Systems

Robust Echo-Cancellation for Simple

VoIP-Applications in Embedded Systems

Examensarbete utfört i Kommunikationssystem

vid Tekniska högskolan i Linköping

av

Abstract

Sammanfattning

Acknowledgments

Contents

List of Figures

List of Tables

List of Abbrevations

Chapter 1

Introduction

1.1

Background

1.2

Problem Statement

1.3

Scope

1.4

Approach

1.5

Thesis outline

Chapter 2

Theoretical Background

2.1

Acoustic Echo Cancellation

2.1.1

Approach

2.1.2

Model

2.1.3

Double Talk

2.2

Algorithms

2.2.1

Least Mean Squares

2.2.2

Correlation Least Mean Squares

2.2.3

Recursive Least Squares (RLS)

2.3

Hardware

Chapter 3

Implementation

3.1

MATLAB

3.1.1

Performance

3.1.2

Complexity

3.1.3

Simulations

3.2

Implementation in C

3.2.1

Structure

+

3.2.2

Collection of test data

3.2.3

Testing different resolutions

Chapter 4

Results

4.1

MATLAB results

4.1.1

LMS

4.1.2

VSLMS

4.1.3

CLMS

4.1.4