Acoustic feedback suppression in audio mixer for PA applications

(1)

Acoustic feedback suppression in audio mixer for PA applications

Mattias Ekström

(2)

Department of Physics Linnaeus väg 20

901 87 Umeå

Sweden

(3)

Acoustic feedback suppression in audio mixer for PA applications

Mattias Ekström (maek0025@ student. umu. se ) June 19, 2017

Master’s thesis, engineering physics, spring 2017, 30 credits

Supervisor: Christian Schüld, Limes Audio

(4)

Abstract

When a speaker is addressing an audience, a PA system consisting of a microphone

and a loudspeaker is often used. If the microphone picks up too much of the loud-

speaker energy, acoustic feedback in the form of an unwanted characteristic howling

can occur. Limes Audio is a software company that specializes in improving sound

quality in digital communications, mainly conference telephony, and has developed

a reference product, the Magneto mixer, to demonstrate the capability of their soft-

ware TrueVoice. The company now wishes to expand the field of usage for the

Magneto mixer to enable it to work as a microphone mixer in PA scenarios, and for

this, a feedback suppression feature is needed. This master’s thesis aims at survey-

ing the market and the literature in the field and specifying the requirements for

a feedback suppression feature. Three methods for suppressing howling feedback

are evaluated through simulations and compared in terms of maximum stable gain

(MSG) and subjective listening experience. The method that performed the best

based on these criteria was acoustic feedback cancellation with a 5 Hz frequency

shift on the loudspeaker signal. This method makes use of an adaptive filter to

model the acoustic feedback path and to remove the feedback component from the

microphone signal. In the simulations, the method was able to increase the stable

gain by approximately 10 dB while maintaining a good sound quality.

(5)

Sammanfattning

När en talare talar för en publik används ofta ett PA system bestående av en mikro-

fon och en högtalare. Om mikrofonen tar upp för mycket av ljudet från högtalaren

finns en överhängande risk för akustisk rundgång i form av ett karaktäristiskt oöns-

kat tjut. Limes Audio är ett företag som utvecklar mjukvara för att förbättra ljud-

kvaliten i digital kommunikation, främst inom konferenstelefoni. De har utvecklat en

demonstrationsprodukt, Magnetomixern, som kan användas som en konferenstele-

fon för att demonstrera deras programvara TrueVoice. Företaget önskar nu utveckla

Magnetomixern till att även fungera som en ljudmixer för PA-scenarion, eller kon-

ferenstelefoni där intern ljudförstärkning i rummet behövs, och för detta behövs en

funktion för att ta bort eventuell rundgång. Detta examensarbete har som mål att

lägga grunden för en sådan funktion i Magnetomixern genom att undersöka markna-

den och litteraturen på området. Tre metoder för att eliminera rundgång utvärderas

i simuleringar och jämförs beträffande maximal stabil förstärkning (MSG) och sub-

jektiv ljudkvalitet. Metoden ”Acoustic feedback cancellation” tillsammans med ett 5

Hz frekvensskifte på högtalarsignalen gav högst MSG och bäst ljudkvalitet. Metoden

använder ett adaptivt filter för att approximera den akustiska återkopplingsvägen

mellan högtalare och mikrofon samt tar bort rundgångskomponenter från mikrofon-

signalen. I simuleringarna kunde metoden öka den maximala stabila förstärkningen

med upp till 10 dB medan en god ljudkvalitet på talet bibehölls.

(6)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

List of abbreviations

AEC Acoustic Echo Cancellation AEQ Automatic Equalization

AFC Acoustic Feedback Cancellation FFT Fast Fourier Transform

FIR Finite Impulse Response IIR Infinite Impulse Response

IMSD Interframe Magnitude Slope Deviation LTI Linear Time-Invariant

MSG Maximum Stable Gain

NFS Notch filter based Feedback Suppression NLMS Normalized Least Mean Square PA Public Address

PHPR Peak-to-Harmonic Power Ratio

PNPR Peak-to-Neighouring Power Ratio

RIR Room Impulse Response

(7)

1 Introduction 1

1.1 Background . . . . 1

1.2 Motivation . . . . 2

1.3 Objective . . . . 2

1.4 Disposition . . . . 2

2 Theory 4 2.1 Basics of signals and systems . . . . 4

2.1.1 Linear systems . . . . 4

2.1.2 Digital filters . . . . 6

2.2 The feedback phenomenon . . . . 6

2.3 Stability analysis . . . . 9

3 Methods used in feedback suppression 12 4 Description of algorithms 16 4.1 Frequency shifting . . . 16

4.1.1 Analytic signal . . . 17

4.2 Two-stage notch filtering . . . 18

4.2.1 Detection stage . . . 19

4.2.2 Suppression stage . . . 21

4.3 Acoustic feedback cancellation . . . 22

4.3.1 NLMS . . . 23

5 Method for testing 25 5.1 MATLAB simulation and evaluation . . . 25

6 Results 28

(8)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

6.1 Feedback suppression . . . 28

6.2 Maximum stable gain . . . 32

6.2.1 Frequency shifting . . . 32

6.2.2 Notch filters . . . 32

6.2.3 Acoustic feedback suppression . . . 33

6.3 Subjective listening experience . . . 34

7 Discussion, conclusion and future work 36

References 38

(9)

1 Introduction

1.1 Background

In any given situation where a speaker is addressing an audience using a Public Address (PA) system, consisting of a microphone and a loudspeaker, the entire performance is at risk of being ruined by feedback, perceived as ”howling” at a certain frequency. Feed- back howling is not only an unpleasant experience for the audience, but also puts the PA equipment at risk of being damaged. Feedback occurs when the microphone takes up too much of the loudspeaker’s energy (see chapter 2), and causes unstable oscillations at problematic frequencies which is perceived as howling, that probably is familiar to the reader. Throughout the history of PA systems, feedback has been a reoccurring phenomenon and different measures have been taken to prevent this unpleasant experi- ence. Since the 1960s, when the first feedback suppression methods were presented[1], [2], novel methods and algorithms have been proposed, and since the dramatic increase in the use of digital computers in the 1980s and forward, more powerful and efficient algorithms have been developed through software implementations in digital signal pro- cessors (DSP). Today, many consider the best method to avoid howling feedback to be a careful and well planned setup of the microphone and loudspeakers, along with an ex- perienced sound technician that sets the equalization in the PA system to be optimized for the specific room, and decrease the gain of potentially problematic frequencies [3].

In many applications though, there is a need for a plug-and-play solution without the

presence of a sound technician, and for these scenarios, the processes usually performed

by a sound technician must be automated or other measures needs to be taken in order

to avoid howling feedback.

(10)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

1.2 Motivation

Limes Audio AB is a company owned by Google that develops audio solutions for en- terprise applications. Their main product, TrueVoice, has been developed to remove echoes, noise and other sonic artefacts in conference telephony and other applications that makes use of a communication system with a loudspeaker and microphone situated in the same unit. Limes Audio has designed a reference product called the Magneto mixer, that can be used as a plug-and-play conference mixing unit together with a com- puter, and has the TrueVoice software embedded. The company now wishes to look into the possibility of expanding the field of usage for the Magneto mixer, from working as a conference telephony mixing unit to also be able to work as a plug-and-play mixer unit in a PA system, and other teleconferencing scenarios where internal sound reinforcement is necessary. For this, the software in the Magneto mixer needs to be adapted for the PA case, which has a different problem formulation than the teleconference case.

1.3 Objective

For the Magneto mixer to work properly in the PA case, there is a need for a feedback suppression feature. There are two main objectives for this work. The first objective is to survey the literature on the subject as well as the competitors solutions to the feedback problem, and provide documentation on the findings. The second objective is to specify the requirements for a feedback suppression feature in the Magneto mixer and to develop MATLAB code demonstrating the performance of some chosen methods, and to perform an evaluation regarding which method Limes Audio should aim at including in the Magneto Mixer in their future work of integrating a feedback suppressor in the Magneto mixer.

1.4 Disposition

Chapter 2 describes the mathematical theory of the feedback problem and the conditions required for howling feedback to occur. Chapter 3 briefly describes the available methods on the subject, and provides arguments for my choice of methods for the next section.

Chapter 4 describes the chosen feedback suppression algorithms in detail, and chapter 5

describes the methods used for testing the implementations and simulating the PA-setup

(11)

in MATLAB. Chapter 6 presents the results from the evaluation procedures and chapter 7

concludes the report with a discussion of the findings in the work, and suggestions for

future work.

(12)

2 ^Theory

This chapter describes the theoretical foundation upon which all feedback suppression algorithms are based, starting from the fundamentals in signals and systems. The math- ematical formulation of the feedback problem is presented, and the conditions required for howling feedback to occur are explained.

2.1 Basics of signals and systems

2.1.1 Linear systems

A system H is an operator that takes an input x(t) and produces an output y(t):

y(t) = H{x(t)}. (2.1)

H is said to be linear if it satisfies the superposition principle: if several inputs x 1 (t), x 2 (t), ..., x i (t) produces outputs

y 1 (t) = H{x 1 (t)} (2.2)

y 2 (t) = H{x 2 (t)} (2.3)

.. . (2.4)

y i (t) = H{x i (t)}, (2.5)

(13)

then the output upon addition of the inputs and possibly scaling them by factors α i

satisfies

α 1 y 1 (t) + ... + α _i y _i (t) = H{α 1 x 1 (t) + ... + α _i x _i (t)}. (2.6) A system is furthermore said to be time-invariant if a time shift T in the input only results in a corresponding time shift in the output:

y(t − T ) = H{x(t − T )}. (2.7)

A Linear Time-Invariant (LTI) system can be described by its impulse response h(t) in the time domain and by its frequency response H(ω) in the frequency domain. The impulse response is the output from an LTI system being excited with an impulse at time t = 0. In the discrete domain, this impulse is represented by the Kronecker delta impulse

d i =







0 if i 6= 0

1 if i = 0. (2.8)

The corresponding impulse in the continuous domain is the Dirac delta function. If the impulse response is known, one can, for any input x(t), determine the output y(t) of the system with the convolution operator ∗:

y(t) = h(t) ∗ x(t). (2.9)

The frequency response, H(ω) is obtained by computing the Fourier transform of the impulse response h(t), and describes the frequency spectrum of the output of the LTI system, when the input is one of the above described impulse functions:

H(ω) = F{h(t)}, (2.10)

where F is the Fourier transform operator. A property of interest for the convolution operator is the convolution theorem, which states that, upon computing the Fourier transform of both sides of eq. (2.9):

Y (ω) = F{h(t) ∗ x(t)} = F{h(t)}F{x(t)} = H(ω)X(ω), (2.11)

where Y (ω), X(ω) are the Fourier transforms of their corresponding signal [4].

(14)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

From eq. (2.11), it can easily be deduced that the total frequency response of a system can be found by dividing the Fourier transform of the output signal by the Fourier transform of the input signal:

H(ω) = Y (ω)

X(ω) . (2.12)

For real-valued signals, the corresponding Fourier transforms are complex and Hermi- tian [4]. From the complex-valued frequency response H(ω), the magnitude response

|H(ω)| and the phase response ∠H(ω) can be computed. These quantities describe the magnitude and phase of the frequency components in the output signal from the system.

2.1.2 Digital filters

A digital filter is a system that manipulates an input signal in a desired way to produce a specific output. Examples of these are band pass filters, low pass filters and high pass filters. Digital filters can be either Finite Impulse Response (FIR), or Infinite Impulse Response (IIR). As the names suggests, the impulse response of a FIR filter is of finite order, and infinite for an IIR filter. Since FIR filters have finite impulse responses, they are always stable, but can be computationally demanding, as opposed to IIR filters, that can sometimes be unstable, but are in general less computationally demanding than FIR filters [4].

2.2 The feedback phenomenon

In situations where a speaker is addressing an audience located in the same room, a PA system, consisting of a microphone and loudspeakers, is often used. Due to the fact that the microphone and loudspeaker are situated in the same room, there is a significant risk of feedback from the loudspeakers to the microphone, which sometimes can be heard as a characteristic ”howling” of tones with problematic frequencies for the specific enclosure.

Howling occurs when the microphone takes up too much of the loudspeaker energy and is undesired, resulting in an unpleasant experience for the audience and a risk of damaging the PA equipment.

The scenario can be described by the model shown in fig. 2.1. Throughout the work,

we will assume that the source signal u(t) contains speech only, the background noise

(15)

x(t)

G

y(t)

F

u(t)

Figure 2.1 – A model of the scenario case, here including one microphone and one loudspeaker (single-channel system).

will not be considered. Furthermore, the speech is assumed to have been sampled to the discrete domain at 16 kHz, which according to the Nyquist sampling theorem results in that all signals components up to 8 kHz will be sampled without aliasing[4]. The vast majority of the human speech is contained within this bandwidth, and therefore it is assumed that the continuous source signal is band limited to 8 Hz and thus can be sampled at 16 kHz and perfectly reconstructed from the samples without aliasing.

In fig. 2.1, a speaker produces speech into a microphone, resulting in a source signal u(t). The signal is then processed in the electro-acoustic forward path, here denoted G. This processing includes the amplifier gain and possibly digital audio effects such as compression and equalization. One of the most simple types of processing in the electro- acoustic forward path is a broadband gain, which is simply the ratio of the output signal power and the input signal power. A broadband gain G(t) can be expressed in dB as

Gain = 20log x(t) y(t)

!

[dB], (2.13)

and is the only processing in the electro-acoustic forward path considered in this work.

The amplified output signal x(t) is then transmitted to the loudspeaker. The output from the loudspeaker propagates through the room in which the PA system is set up, and interacts with the environment in a way described by the acoustic feedback path F . The acoustic feedback path is modelled as a linear system, with input signal x(t).

According to eq. (2.9), we can compute the output from that system, which is the

feedback signal going back into the microphone, by convolving the loudspeaker signal

(16)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

with the impulse response of the acoustic feedback path F (t), also denoted the Room Impulse Response (RIR). The signal is fed back into the microphone, forming a closed loop system described by

y(t) = F (t) ∗ x(t) + u(t)

x(t) = G(t) ∗ y(t), (2.14)

where F (t) and G(t) are the impulse responses of the acoustic feedback path and the electro-acoustic forward path, respectively. Upon computing the Fourier transform on both sides of eq. (2.14) and making use of the convolution theorem in eq. (2.11), one obtains:

Y (ω) = F (ω)X(ω) + U (ω) (2.15)

X(ω) = G(ω)Y (ω), (2.16)

where F (ω) and G(ω) are the frequency responses of the corresponding systems, and X(ω), U (ω) and Y (ω) are the frequency contents of their corresponding signal. From this, one can compute the total frequency response from the source u(t) to the output x(t) by using the property described in eq. (2.12):

H(ω) = X(ω)

U (ω) = G(ω)Y (ω)

Y (ω) − F (ω)X(ω) = G(ω)

1 − F (ω)G(ω) . (2.17) The term F (ω)G(ω) is referred to as the loop response of the system, and the related magnitude response |F (ω)G(ω)| is denoted the loop gain, whereas the phase response

∠F (ω)G(ω) is denoted the loop phase. The system described by the transfer function in eq. (2.17) is assumed to be a linear, time-dependent, finite order system, as described in section 2.1.1. These assumptions are justified in [3], where the authors argue that the linearity can be derived from the fact that a sound wave’s interaction with the environ- ment can be considered level independent, meaning that the nature of the reflections is not dependent on the sound pressure level and therefore linear. The time-dependency assumption is an obvious one, since the feedback path is dependent upon all movements and changes in the room, including the microphone or loudspeaker changing positions.

Finally, the system can be considered to be of finite order owing to the fact that RIRs in

(17)

0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400

−4

−2 0 2 4 6 8 ·10

⁻²

Sample number

f (t )

Figure 2.2 – The impulse response of a typical room, truncated at 2001 samples

general are infinite, showing an exponential decay over time, as shown in fig. 2.2. From this observation, it is reasonable to allow truncation at a certain length of the RIR.

2.3 Stability analysis

Even though the system H(ω) is indeed time varying due to changes in the RIR, it is common practice in the field of feedback suppression to carry out the stability analysis for a time invariant system [3]. This is the reason that the expressions in eqs. (2.15) to (2.17) do not depend on time. The stability analysis originates from the paper ”Regeneration theory” by Harry Nyquist [5], which can be consulted for further reading. For the system described in eq. (2.17), the system becomes unstable for |F (ω)G(ω)| ≥ 1, or 0 dB. In order for the signal to diverge due to feedback, the components for the problematic frequencies from each loop needs to superimpose over time. For this to occur, the frequency components needs to be in phase, which requires the phase to be multiples of 2π. This condition for instability is summarized in the Nyquist stability criterion:

if there exists a radial frequency ω for which the loop gain is greater than or equal to

unity, and for which the loop phase is any multiple of 2π, then the system is unstable:

(18)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

0 1000 2000 3000 4000 5000 6000 7000 8000

−70

−60

−50

−40

−30

−20

−10 0

f [Hz]

|F(f)|[dB]

(a) The magnitude response

0 1000 2000 3000 4000 5000 6000 7000 8000

−700

−600

−500

−400

−300

−200

−100 0 100

f [kHz]

∠F(f)[rad]

(b) The phase response

Figure 2.3 – The characteristics of a typical room: the magnitude response and the phase response

|F (ω)G(ω)| ≥ 1 (2.18)

∠F (ω)G(ω) = m2π m ∈ Z. (2.19)

The corresponding frequency f = ω/2π will, if present in the source signal, cause unsta- ble oscillations in the system perceived as a howling sound. It should be pointed out that the assumption that the system is time-invariant is not necessarily fulfilled. Actually, it is virtually never fulfilled for any given PA scenario. However, under the assumption that the RIR is ”slowly changing” over time, the Nyquist stability criterion applies. It is important to note that this assumption can cause problems when the RIR is rapidly changing, such as when the speaker is holding a portable microphone and is walking around in the room, as explained in chapter 4. For this reason it is of importance to be aware to this assumption.

Any given room with RIR F (t) has a specific value of Maximum Stable Gain (MSG), which can be found from the frequency response F (ω). Expressed in dB, the initial MSG is computed by finding the peak with the largest magnitude in the frequency response that fulfils the phase condition eq. (2.19), and calculate how far that peak is from 0db.

The initial MSG is computed, in dB, as

− 20log max|F (ω)| ∀ω : ∠F (ω) = m2π m ∈ Z. (2.20)

(19)

The magnitude and phase responses of a typical room, and also one of the room char- acteristics used in the simulations in this work, are shown in fig. 2.3. The MSG of the RIR shown in fig. 2.3 is 3.087 dB.

The main objective of feedback suppression is to manipulate the total transfer function,

by introducing additional sub-systems which alters the total frequency response in order

to increase the MSG, preferably without distorting the source signal. In the following

chapters, we will look into different methods of achieving this.

(20)

3 Methods used in feedback suppression

This chapter is a summary of the history and available literature of the field of feedback suppression. The field of acoustic feedback suppression is a well studied subject, and several methods have been proposed to solve the howling problem. There are four main categories of feedback suppression, namely

• Periodic modulation methods

• Gain reduction methods

• Room modelling methods

• Spatial filtering methods (beamforming)

The first methods to address the issue of howling feedback, developed in the 1960s [1], [2], belong to the first category. Implemented with electronic components, these methods consists of manipulation of the microphone signal before amplification by altering the phase of the signal by a small value φ, or by shifting the frequency of the signal by a small

∆f . In [2], an increase in maximum stable gain of 14dB was reported, but the effects on the sound quality were too severe to be considered acceptable. Frequency shifting is a method that is used in some commercial products today. One of these methods, namely a frequency shift of 5 Hz, is evaluated in this work and is explained in-depth in section 4.1.

The second category, gain reduction methods, can be divided into three subcategories,

depending on the frequency range in which the gain is reduced. Early works applied a

full-band gain reduction upon detecting howling [6]. This method does obviously not

increase the maximum stable gain, but merely brings back an unstable system to a stable

(21)

state. Full-band gain reduction was later refined into Automatic Equalization (AEQ), which divides the input signal into frequency bands, and performs feedback detection on every sub-band. If a howling frequency is detected, the gain is reduced only in the sub-band where the critical frequency resides, thus leaving the rest of the signal intact.

The AEQ methods can be described as an attempt to automate the work of an audio engineer, who often works with sub-band equalization to reduce feedback. The AEQ method was further refined into Notch filter based Feedback Suppression (NFS), where notch filters are used to suppress problematic frequencies at which howling has been detected. Notch filters are stop band filters with a very narrow stop band (called a

”notch”), which severely reduces the gain in that particular frequency band and thus removes those frequencies from the signal. These notch filters can be designed to be very narrow, thus only suppressing a very small frequency band of the signal, namely where the howling occurs. It should be mentioned that notch filters can be implemented as both FIR and IIR filters, but in order to make them very narrow, a high order is required, which means that IIR filters are often prefered. To suppress several frequencies in a signal, a number of notch filters, centered at different frequencies, can be applied on a signal, either by applying several filters in series or by designing one filter with two or several ”notches”.

The NFS methods are by far the most used in commercial products today. All NFS

methods include a detection phase and a suppression phase [3], and are divided into

one-stage NFS methods and two-stage NFS methods. In one-stage methods, detection

and suppression are performed in the same step. In [7], the authors use adaptive notch

filters in order to detect and suppress howling in the same stage. It is concluded in

the paper, that the adaptive notch filters used in their work did not produce sufficient

feedback suppression in the entire frequency range. The most commonly used methods

in the NFS category are so-called two-stage methods, where detection and suppression

are separated. Often including the Fourier transform computed by the Fast Fourier

Transform (FFT), the frequency spectra of segments of the signal are evaluated. A

frequency spectrum is scanned with a peak-picking algorithm to find the frequencies

that has the most power, and the frequencies corresponding to these peaks are tested

against certain criteria to determine if they are indeed howling frequencies, or just tonal

components in the signal. If the detection algorithm finds a howling frequency, the

suppression stage receives information about the frequency at which howling occurs,

and applies a notch filter at that specific frequency to suppress the howling.

(22)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

There are several spectral (frequency based) and temporal (time-based) features that a howling components has, but a tonal component has not. In practice, one of these, or a combination of them can be used to determine if a peak in the frequency spectrum corresponds to a howling frequency. [8] evaluates a number of criteria that can be used to evaluate if a signal component is a howling component of a tonal component. A two-stage notch filter based method is evaluated in this work and is explained in-depth in section 4.2.

The third category, room modelling methods, sometimes also called Acoustic Feedback Cancellation (AFC), resembles the methods used in Acoustic Echo Cancellation (AEC), a feature used in conference telephony and other applications where a speaker commu- nicates with another speaker at a distant location using a conference telephone. In these cases, the far-end speaker’s voice is output from a loudspeaker and fed back into a mi- crophone, resulting in an echo back to the far-end speaker, if no measures are taken.

A common approach in AEC is to use an adaptive filter ˆ F to approximate the RIR F , and filter the output from the loudspeaker with ˆ F in order to model the feedback, and remove the approximated feedback component from the microphone signal. If the adap- tive filter is perfectly approximated, no feedback component remains in the microphone signal.

The main difference between the AFC case and the AEC case is that the loudspeaker signal is highly correlated to the microphone signal in AFC, which is not the case in the AEC case [9]. When there is high correlation between the loudspeaker and microphone signals, which occurs during ”double-talk” scenarios (when the near-end speaker and far- end speaker speaks simultaneously), the AEC algorithms are known to perform poorly in adapting the filters. This makes the AEC methods unsuitable for the AFC case, which can be described as the AEC case with constant double-talk. In order to use adaptive filters to remove the unwanted feedback, one needs to use decorrelation methods to decrease the correlation between the loudspeaker and microphone signals[3], [10], [11].

Different methods for decorrelation have been suggested, such as noise injection on the loudspeaker signal, frequency shifting or phase shifting the loudspeaker signal, non-linear processing, introduction of a delay in the forward path and decorrelating pre-filters[9].

The method of using adaptive filters to remove unwanted signal contents, and the AFC method evaluated in this work is further explained in section 4.3.

The fourth category, which is also known as beam forming, consists of using special

microphone- and/or loudspeaker arrays in order to reduce the signal transport between

the microphone and the loudspeaker, by modifying the directivity patterns of the array

(23)

to have the null direction in the direction of the other unit. These methods require additional hardware and will for that reason not be considered in this work, which is limited to software implementation.

In [3], the authors conclude that the most promising method in terms of achievable increase in MSG and subjective sound quality is the AFC approach. For this reason, one of these methods will be included in the MATLAB evaluation of methods. Upon surveying the market, it is obvious that the two-stage NFS methods are by far the most common in feedback suppression products. For this reason, one of these methods will be implemented and evaluated in MATLAB. The nature of these methods includes the disadvantage of being reactive, in the sense that howling sound needs to be detected, and thus is often heard before it is suppressed. This is clearly a drawback of these methods.

AFC on the other hand is a proactive suppression method, which removes feedback and

echoes continuously, making it slightly more interesting than the NFS approach. As

explained in the section above, the AFC methods need a routine for de-correlating the

loudspeaker signal from the microphone signal, and a 5 Hz frequency shift was chosen

for this, mainly due to its simplicity, but also since frequency shifting is by itself a

feedback suppression method, which then can also be included as a stand-alone method

for comparison. The algorithms, by which these three methods operate, are presented

in detail in the following chapter.

(24)

4 Description of algorithms

In this section, the three chosen methods frequency shifting, notch filter-based feedback suppression and acoustic feedback cancellation will be described in detail, and the nature of howling will be related to them.

4.1 Frequency shifting

The frequency shifting method, as the name suggests, manipulates the microphone signal by shifting all frequency components with a predetermined value ∆f . By performing this frequency shift, one aims at circumventing the magnitude condition eq. (2.18), by not allowing the signal components with the critical frequency f c to build up every loop, but instead being shifted to frequencies which fulfil the magnitude condition eq. (2.18), and thus stabilizing the system. A frequency shift can be performed in software by performing manipulations of the so-called discrete-time analytic signal

y _a (t) = y(t) + iˆ y(t), (4.1)

where ˆ y(t) is the Hilbert transform of the original signal and i is the imaginary unit. The

analytic signal is defined as the original signal with zero negative frequency content. The

negative frequencies can be discarded due to the fact that audio signals are real signals,

and a property of real signals is that their frequency spectra is Hermitian, meaning that

the negative frequencies does not provide any information that cannot be found in the

positive frequency content [4]. One can perform frequency shifting by multiplying the

analytic signal with a complex exponential

(25)

x(t)

G

y(t)

F

FS u(t) d(t)

Figure 4.1 – The system with a frequency shift of the microphone signal in the electro- acoustic forward path

S mod (t) = e ^iω

^s

^t , (4.2)

where ω _s = 2π∆f , and is the modulation frequency. The output from the frequency shift is then obtained by taking the real part of the resulting complex valued signal:

d(t) = Re(y a (t)S mod (t)) = y(t)cos(φ(t)) − ˆ y(t)sin(φ(t)), φ(t) = 2π∆f t, (4.3) where d(t) is the frequency shifted output signal. The modulation can be described by the system in fig. 4.1.

4.1.1 Analytic signal

The analytic signal can be obtained by computing the Fourier transform Y (ω) of a

segment of the input signal, and computing the inverse Fourier transform of the single-

sided spectrum, with the negative frequencies set to 0[12]. The inverse Fourier transform

is an approximation of the analytic signal. Since the spectrum of the approximated

analytic signal is single-sided, it is complex-valued and can be expressed according to

eq. (4.1). The nature of the Fourier transform requires that the input samples are

framed with frame size M samples which will introduce a delay of M samples in the

processing. In this work, an alternative method was used, which uses a modulated low

pass filter in order to obtain an approximation of the analytic signal [13]. To remove

the negative frequency components, a FIR low-pass filter of order 256 with normalized

(26)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

−1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−110

−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 0

Normalized Frequency [ × π rad/sample]

Magnitude[dB]

Figure 4.2 – The magnitude response of the modulated low-pass filter, with a pass band covering the entire positive frequency range and a stop band covering the negative frequency range.

cut-off frequency of f _s /4 was used. This filter was modulated with the frequency f _s /4, resulting in a complex-valued band pass filter with a pass band covering the entire positive frequency range, and a stop band covering the entire negative frequency range.

The filter is visualized in fig. 4.2.

The input samples were buffered into a delay vector of the same length as the complex modulated low pass filter (256), and for each sample, the dot product between the delay vector and the filter was computed in order to obtain the current analytic signal sample, which is an approximation of eq. (4.1) at time t. Equation (4.3) was then applied to the approximated analytical signal sample in order to obtain the frequency shifted output signal sample d(t).

4.2 Two-stage notch filtering

The two-stage notch filtering method makes use of information about the frequency

spectrum of the incoming signal in the detection stage, and applies notch filters to the

signal in the suppression stage, based on the findings in the detection stage. This section

describes the two-stage algorithm used in this work.

(27)

4.2.1 Detection stage

The incoming signal was framed in frames of M = 4096 samples using an overlap between frames to reduce detection time. The overlap was set to M/2 samples. When a frame had been filled with M samples, the frame was multiplied with a Blackman window to avoid spectral leakage. The frequency spectrum Y (ω) of the windowed signal was then computed with the Fourier transform. Due to the fact that the input signal is real-valued, it is sufficient to consider only the single-sided frequency spectrum, out of which the 10 largest peaks were located through a peak-picking algorithm. Evaluating 10 peaks gives a satisfactory level of confidence that a howling frequency is detected, since it is almost always the case that howling frequencies do not occur at the same level of applied gain, but occur ”once at a time” upon increasing the applied gain. In MATLAB, the function findpeaks was used for this. The frequencies corresponding to these peaks were considered ”possible howling frequencies” or {ˆ ω i }, 1 ≤ i ≤ 10.

The set of possible howling frequencies were then evaluated in three different steps to determine if the frequency at hand was a howling frequency or a tonal component of the input signal. This was done by the two spectral evaluations Peak-to-Harmonic Power Ratio (PHPR) and Peak-to-Neighouring Power Ratio (PNPR), along with the temporal evaluation Interframe Magnitude Slope Deviation (IMSD) [8].

4.2.1.1 Peak-To-Harmonic Power Ratio (PHPR)

Tonal components in speech often include harmonics, which are integer multiples of the frequency component. This is not the case for a howling frequency, which consists of a very narrow frequency without significant harmonics. The power of the possible howling frequency is divided by the power of the m ^′ th harmonic to compute the PHPR. This feature is computed for each candidate howling frequency ˆ ω i for the m’th harmonic:

P HP R(ˆ ω i , m) = 10log 10

|Y (ˆ ω i )| ²

|Y (mˆ ω i )| ² . (4.4)

4.2.1.2 Peak-To-Neighbouring Power Ratio (PNPR)

In speech, frequency components includes the property of having a broader bandwidth

than a single sinusoidal frequency component. In the frequency domain, this bandwidth

is identified by the power of the tonal component being shared over several neighbouring

frequency bins, centered around a peak. A howling component on the other hand, does

(28)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

not share power with the neighbouring frequency bins. By computing the power of a possible howling frequency and dividing it with the power of neighbouring frequency bins, one can make an assessment on whether the component is a tonal component or a howling component. The PNPR for the possible howling frequency ˆ ω i with the m’th neighbouring frequency bin is computed as

P N P R(ˆ ω i , m) = 10log 10

|Y (ˆ ω i )| ²

|Y (ˆ ω i + 2πm/M )| ² . (4.5) The values computed in eqs. (4.4) to (4.5) are then compared to predetermined thresholds T _{P HP R} , T _{P N P R} , and if the computed values are higher than the threshold values for frequency ˆ ω i , it is considered to be a howling frequency.

4.2.1.3 Interframe Magnitude Slope Deviation (IMSD)

This feature uses the fact that howling has been observed to increases exponentially in energy over time, which means linearly in dB-scale. This increase is not observed in tonal components. IMSD for the possible howling component ˆ ω i computes a measurement of the deviation from linear increase, by performing differentiations between the energy for ˆ

ω i at older frames, and more contemporary frames. A large deviation from linearity, that is to say a large IMSD, suggests that the candidate is indeed not a howling frequency, whereas for small deviations, the candidate is considered a howling frequency. The IMSD is computed by

IM SD(ˆ ω i , t) = 1 M F

M

F

−1

X

m=1

"

1 M F

M

F

−1

X

j=0

1 M F − j

(20log|Y (ˆ ω i t − jP )| − 20log|Y (ˆ ω i , t − M F P |)−

1 m

m−1

X

j=0

1 m − j (20log|Y (ˆ ω i , t − jP )| − 20log|(Y /ˆ ω i , t − mP )|)

#

. (4.6)

The IMSD for each candidate howling component is compared to the threshold value

T _{IM SD} , and if IM SD(ˆ ω _i , t) < T _{IM SD} , the frequency ˆ ω _i is considered to be a howling

frequency.

(29)

Table 4.1 Threshold Value [dB]

T P HP R 10 T P N P R 30

T IM SD 1

4.2.1.4 Final assessment

The thresholds used in the three evaluations are presented in table 4.1.

For the PHPR, the 2nd and 3rd harmonics were included in the evaluation, and howling was said to be detected if the threshold was exceeded for all harmonics. In the PNPR, the six closest neighbours, three above and three below, were included, and howling was said to be detected if the ratio exceeded the threshold for all neighbours. The IMSD stored the frequency contents of the last 16 frames, and thus evaluated the slope for all possible howling frequency components over 16 frames. These numbers were inspired by [8], where the authors evaluated a number of spectral and temporal criteria for howling detection, and found the combination above to be robust and with a small false-alarm percentage ¹ . The final threshold values were tweaked and tested until a reasonable howling detection was obtained.

The total assessment of the possible howling frequencies for each frame consisted of a combination of PHPR, PNPR and IMSD, and only if all three conditions for howling were fulfilled for the frequency ˆ ω _i , it was considered to be a howling frequency, and actions were taken to suppress the frequency at hand.

4.2.2 Suppression stage

Upon detecting howling at frequency ˆ ω i , the suppression stage applied a notch filter in the acoustic forward path, centered at frequency ˆ ω _i . A maximum of 20 notch filters was set, in order to prevent the source signal from being overly distorted. To make the filters as narrow as possible, biquadratic IIR filters were used in series, where the output y i [n]

from filter i with input x i [n] can be computed from the difference equation

1

The false-alarm percentage is the ratio of occurrences of erroneously detected frequencies over the

total number of detected frequencies

(30)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

y i [n] = 1 a 0

b 0 x i [n] + b 1 x i [n − 1] + b 2 x i [n − 2] − a 1 y i [n − 1] − a 2 y i [n − 1] , (4.7) where n is the sample number and a ⁰ , ..., a ² , b ⁰ , ..., b ² are the filter coefficients. For each of the 20 filters, the two latest output samples y[n − 1] and y[n − 2] and the three latest input samples x[n], x[n − 1] and x[n − 2] are required. These samples were stored in a 3x21 matrix Y del , where the input samples to the i ^′ th filter were stored in i ^′ th column, and the output samples were stored in the i + 1 ^′ th column:

Y del =







x 1 [n − 2] y 1 [n − 2] = x 2 [n − 2] . . . y C [n − 2]

. . . . ..

x ¹ [n] y ¹ [n] = x ² [n] . . . y C [n]







(4.8)

where C is the number of active notch filters. Since the filters were applied in series, the output samples from the i ^′ th filter are the same as the input samples to the i + 1 ^′ th filter. The filter design is by itself not considered in depth in this work. The filters do not need to be designed in real time upon detection, since the frequency resolution of the Fourier transform is known a-priori. The size of the Fourier transform frames used in the detection phase was 4096 samples, which results in 2048 samples in the one-sided frequency spectrum. Since the highest possible frequency was 8 kHz, the frequency resolution was 8000/2048 = 3.9063 Hz / frequency bin. Knowing the frequency resolution, notch filters can be designed offline for all available frequencies, and then stored to save computational effort in the real-time implementation. Upon detecting howling at a specific frequency, a look-up table can be used to activate the correct filter. In this work however, the filters were designed upon detection with the MATLAB function iirnotch, which returned the filter coefficients that were stored in a 6x20 matrix. All notch filters were designed to have a Q-factor of 35. With C number of active notch filters, the output sample d(t) is the last element from the C + 1 ^′ th column of the matrix Y _del . Recall that the total number of notch filters allowed were 20, which makes the last element of the 21st column the final output sample, if all notch filters are active.

4.3 Acoustic feedback cancellation

The method of using adaptive filters to cancel out unwanted components from the micro-

phone signal is widely used in teleconference applications. Acoustic feedback cancellation

(31)

x(t)

G

+ + y(t)

F

u(t) F ˆ

ˆ y(t) d(t) −

Figure 4.3 – The AFC situation, where the impulse response F (t) is approximated with an adaptive filter ˆ F (t).

is similar to the teleconference case, but instead of a far-end speaker signal being output from the loudspeaker, it is the near-end speakers voice. The AFC system is described in fig. 4.3.

F is an adaptive filter which is designed and adapted to resemble the real RIR F . ˆ The loudspeaker signal x(t) is then filtered with ˆ F in order to estimate the feedback component of the microphone signal. There are several algorithms to go about this, and the one utilized in this work is the Normalized Least Mean Square (NLMS) algorithm[14].

This is a common algorithm in echo cancellation, and is generally a good trade-off between computational complexity and convergence speed [15]. The NLMS algorithm is described as follows.

4.3.1 NLMS

In each iteration, the output from the adaptive filter is computed as

d[n] = y[n] − ˆ F ^T [n]x[n], (4.9)

where ˆ F is the adaptive filter of size N , and x is a delay vector containing the N latest loudspeaker output samples. The term ˆ F ^T [n]x[n] is thus the approximated feedback component in the microphone signal.

The adaptive filter ˆ F is then updated according to

(32)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

F [n + 1] = ˆ ˆ F [n] + µ d ^∗ [n]x[n]

x ^H [n]x[n] , (4.10)

where µ is the step size and the term x ^H [n]x[n] is the energy contents of the loudspeaker output delay vector. The division by the energy term, which is the difference between NLMS and LMS, is included to avoid the algorithm to be sensitive to scaling of the loudspeaker vector x. If the filter converges perfectly so that ˆ F = F , all feedback components of the source signal will be removed, so that d[n] = u[n], leaving only speech in the microphone signal. The choice of the step size parameter is of great importance to the convergence of the adaptive filter. If the step size is too small, the adaptive filter will converge slowly and respond slowly to changes in the RIR, resulting in an erroneous filter in non-stationary conditions. On the other hand, if the step size is too large, the convergence speed will increase, but problems with stability might occur.

For speech applications, a step size of between 0.01 and 0.04 has been recommended in literature [3]. In this work, a fixed step-size of 0.01 was used, which was found to be a reasonable trade-off between convergence speed and stability. To avoid that the filter updates when the loudspeaker signal was not strong enough, a threshold T energy

was introduced, and the condition x ^H [n]x[n] > T _energy was set as a requirement for allowing the filter to update. As previously mentioned, the NLMS algorithm performs poorly when there is a high correlation between the loudspeaker and microphone signals.

For this reason, the loudspeaker signal was decorrelated from the microphone signal by frequency shifting the output signal d[n] by 5 Hz before amplification with the algorithm described in section 4.1. Since in the simulations the actual RIR is known, we can evaluate the performance of the adaptive filter by computing the filter misadjustment in each iteration:

F M A =

N −1 P

i=0

( ˆ F i − F i ) ²

N −1 P

i=0

F _i ²

. (4.11)

(33)

5 Method for testing

In order to properly evaluate the tested methods, a theoretical measure of the maximum stable gain was needed. This was done in MATLAB, where a PA-system was simulated and set up to be able to evaluate the methods, both in terms of maximum achievable stable gain and the subjective listening experience: how well do the methods sound.

5.1 MATLAB simulation and evaluation

Methods from the DSP toolbox were used in order to read audio data from the source file. The source file that was used in the simulations was a 35 second section from a radio essay by Johan Norberg called ”Johan Norberg om den exploderande lyckotrenden”[16], resampled to 16 kHz. The file was read in blocks of 1024 samples at a time, and a loop through the samples of the blocks simulated single input, single output processing. A simple user interface was created, to be used in the ”live” mode, in order to subjectively evaluate the methods. The user could choose between the three evaluated methods, and also set the applied gain in real-time. The user also had the option to disable all feedback suppression to evaluate the system without any processing.

Once a sample had been processed, it was put in an output buffer of the same length

as the RIR used in the simulations, namely 2001 samples. The dot product of the full

2001 samples of the output buffer and the RIR was computed to obtain the feedback

component of the microphone signal. A new feedback component was computed for

every new input sample, and added to the input sample to obtain the microphone signal,

consisting of both the source signal and the feedback component. Every 1024’th iteration,

the 1024 newest samples were output to the loudspeaker. This process successfully

(34)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

simulated the loudspeaker signal’s interaction with the room, and the feedback into the microphone. Howling could clearly be heard in the simulations, upon adjusting the gain to a level over the initial maximum stable gain.

The simulation of the PA-system could also be run in a ”test” mode, where measurements of the maximum stable gain were taken and stored. Since the three methods differ in their way to affect the signal different expressions had to be used in order to calculate the maximum stable gain. This could have been done in several ways. For instance, the gain could be automatically raised in small steps in order to induce howling feedback, upon which the gain level at which howling occurs could be noted. This way to go about this is sub-optimal, since an instability does not directly induce howling, which means that the howling can be missed if the measurements are too short, resulting in an overestimation of the maximum stable gain.

In the simulations, the maximum stable gain was measured from the known RIR used to simulate the acoustic feedback path. By considered the RIR without feedback sup- pression, one can determine the initial maximum stable gain, simply by observing the frequency response, and finding the MSG using eq. (2.20).

The maximum stable gain for the different methods was calculated by applying the filters corresponding to the methods to the RIR, obtaining a modified RIR for each method. For the frequency shifting method, a time-varying filter corresponding to the 5 Hz frequency shift was applied to the RIR, which resulted in a maximum stable gain that oscillated over time. For the notch filter methods, notch filters were applied to the RIR when a howling frequency was detected in the simulations, and a new maximum stable gain was computed from the modified RIR, with the detected howling frequencies suppressed. For the frequency shifting method and the NFS method, the MSG was computed as

M SG N F S,F S = −20log max|H(t, ω)F (ω)| ∀ω : ∠F (ω) = m2π m ∈ Z, (5.1) where the filter H(t, ω) is a time dependent 5 Hz frequency shift or the cascade of active notch filters, depending on which method is being tested. For the AFC method, the maximum stable gain was calculated by finding the highest peak in the difference

|F (ω) − ˆ F (ω)| that fulfils the phase condition accoring to

M SG _{AF C} = −20log max|F (ω) − ˆ F (ω)| ∀ω : ∠F (ω) = m2π m ∈ Z. (5.2)

The 35s speech segment was divided into four sections of approximately 9 seconds each.

(35)

In the first section, the applied gain in the electro-acoustic forward path was set to 0

dB, which was approximately 3 dB below the initial maximum stable gain. The gain

was increased dB-linearly in the second section, until reaching its final level of 8 dB

at the beginning of section 3. At the beginning of section 4, the RIR was changed,

corresponding to a 1 meter displacement of the microphone. The applied gain and the

altered RIR was kept constant during the fourth section. This test method, found in [3],

is a theoretical evaluation of the maximum stable gain, and how it is affected by the gain

level and changes in the RIR. Both RIRs can be found in [17]. Since a real-time scenario

will result in a more rapidly changing RIR, there is no guarantee that one will be able

to reproduce these results in a real-time setup. The test serves as an initial assessment

of the methods.

(36)

6 ^Results

In this section, the results of the simulations are presented. The three methods were evaluated in terms of maximum stable gain and subjective listening experience.

6.1 Feedback suppression

Loudspeaker signal

5 10 15 20 25 30 35

Time (secs) 0

1 2 3 4 5 6 7 8

Frequency (kHz)

-150 -140 -130 -120 -110 -100 -90 -80 -70

Power/frequency (dB/Hz)

Figure 6.1 – The spectrogram of the loudspeaker signal, 0 dB applied gain.

(37)

Figure 6.1 shows the spectrogram of the loudspeaker signal, when the applied gain was 0 dB. To illustrate the feedback phenomenon, fig. 6.2 shows a spectrogram of the same signal, but the applied gain being manually raised to induce howling. At three occasions, a frequency around 500 Hz shows a divergence in power, which suggests that feedback has occurred at this frequency. The applied gain when the feedback occurred was 4 dB, which is slightly above the initial MSG. When howling feedback was clearly heard, the gain was manually decreased to 0 dB.

Loudspeaker signal

5 10 15 20 25 30 35

Time (secs) 0

1 2 3 4 5 6 7 8

Frequency (kHz)

-140 -120 -100 -80 -60 -40 -20

Power/frequency (dB/Hz)

Figure 6.2 – Spectrogram of the loudspeaker signal with howling feedback present, 4 dB applied gain.

To illustrate the performance of the feedback suppressor algorithms, the gain was set to 6 dB upon which the feedback suppression algorithms were activated. The spectrograms for the three methods are shown in figs. 6.3 to 6.5.

In fig. 6.3, one can observe the oscillating nature of the frequency shifting method. There are indeed frequencies that has an increased power compared to the case with no howling feedback, but they are shifted up, keeping the system stable. Figure 6.4 shows that the notch filter method at the specified gain setting was successful at suppressing feedback.

At 27 seconds, an increased power can be observed briefly in the low-frequency range, indicating that a howling frequency was audible before being detected and suppressed.

The spectrogram for the AFC method, shown in fig. 6.5, shows no such increase in power

(38)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

Loudspeaker signal

5 10 15 20 25 30 35

Time (secs) 0

1 2 3 4 5 6 7 8

Frequency (kHz)

-150 -140 -130 -120 -110 -100 -90 -80 -70 -60

Power/frequency (dB/Hz)

Figure 6.3 – Spectrogram for the frequency shifting method, 6 dB applied gain.

for any frequency, meaning that this method successfully had suppressed all howling

feedback.

(39)

Loudspeaker signal

5 10 15 20 25 30 35

Time (secs) 0

1 2 3 4 5 6 7 8

Frequency (kHz)

-150 -140 -130 -120 -110 -100 -90 -80 -70 -60 -50

Power/frequency (dB/Hz)

Figure 6.4 – Spectrogram for the NFS method, 6 dB applied gain.

Loudspeaker signal

5 10 15 20 25 30 35

Time (secs) 0

1 2 3 4 5 6 7 8

Frequency (kHz)

-150 -140 -130 -120 -110 -100 -90 -80 -70

Power/frequency (dB/Hz)

Figure 6.5 – Spectrogram for the AFC method, 6 dB applied gain.

(40)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

6.2 Maximum stable gain

The results from the maximum stable gain calculations are shown in fig. 6.6, where the different sections are marked with vertical dashed lines. The gain applied in the electro- acoustic forward path is shown as a bold dashed line, and the maximum stable gain curves for all methods are included.

6.2.1 Frequency shifting

It can be seen that the frequency shifting method oscillates around 6 dB MSG, meaning that this method theoretically raises the MSG by approximately 3 dB compared to the case with no feedback suppressor. Upon changing the RIR, the MSG decreased to a slightly lower level. From around 15s into the simulations, the MSG of the frequency shifting method is below the actual applied gain level, meaning that we can expect howling or ringing sounds from 15s and forward.

6.2.2 Notch filters

For the notch filter method, the points where a notch filter was applied can be clearly

visualized by the vertical jumps in the curve. During the parts of the simulation where

the MSG of the notch filter method was above the actual applied gain, the algorithm

should not detect any howling frequencies. In fig. 6.6, this is true for the first ∼ 13 sec-

onds, where no howling was detected and no notch filter was activat. When the applied

gain increased to the level of the MSG for the notch filter method, a howling frequency

was detected, and a notch filter was activated, removing the problematic frequency and

thus increasing the maximum stable gain. Around 17 seconds into the simulation, the

gain level was raised above the MSG level of the notch filter method, which means that

the algorithm failed to detect a howling frequency. During the time interval 17-27 sec-

onds, we should, according to this theoretical measurement, experience some howling

or ringing tones. When the RIR changed, the algorithm successfully suppressed the

problematic frequency / frequencies, raising the MSG to a stable level. When all 20

notch filters were active, which occurs at around 28 seconds into the simulations, the

expected MSG was just below 10 dB, which was an increase with around 7 dB compared

to the case where no feedback suppressor was used. The number of active notch filters

over time is illustrated in fig. 6.7, where it can be seen that no notch filters were active

(41)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 0

2 4 6 8 10 12 14 16

Time [s]

Gain [dB]

Gain

Frequency shift Notch filters

Acoustic feedback cancellation

Figure 6.6 – Maximum stable gain over time for all methods. The MSG curve for the frequency shifting method has been smoothed for better visualization.

until the gain starts to increase, upon which a rapid increase in the number of notch filters is observed. Changing the RIR almost instantly resulted in 5 new notch filters, indicating that a change in the RIR does indeed affect the frequencies for which the Nyquist stability criterion is fulfilled.

6.2.3 Acoustic feedback suppression

The curve for the AFC method is fluctuating heavily throughout the simulations, visu-

alizing the updates of the adaptive filter ˆ F . With the algorithm used, a basic NLMS-

method with the only requirement for the filter to update being the energy threshold,

there is no guarantee that the updated filter ˆ F [n + 1] will perform better than the pre-

vious filter ˆ F [n], and this is the reason that the MSG level sometimes can drop down.

(42)

Acoustic feedback suppression

in audio mixer for PA applications June 19, 2017

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0

2 4 6 8 10 12 14 16 18 20

Time [s]

Num b er of notc h filters

Figure 6.7 – The number of notch filters over time for the NFS method

The MSG level is mostly above the actual applied gain, meaning that throughout the simulation, we should experience no or very little howling. A very temporary drop is observed at around 28 seconds, which could result in a brief howling or ringing sound at that time. The AFC method is observed to perform better at higher applied gain, and the final MSG is expected to be around 11 dB, which is an increase of around 8 dB.

The maximum MSG value for this method occurred before the change in RIR and was around 13 dB, which means that the potential MSG increase is 10 dB. In fig. 6.8, the fil- ter misadjustment is shown. Initially, when the applied gain was 0 dB, the error was high and the filter was thus badly approximated. Since the gain was low, this was expected, since there was not enough information in the loudspeaker signal to correctly adapt the filter. Upon increasing the gain, the misadjustment decreased, and the filter converged.

The change in the RIR resulted in an increase of the misadjustment by approximately 3 dB, upon which the misadjustment again decreased, indicating a converging filter.

6.3 Subjective listening experience

It is difficult to objectively evaluate the quality of processed speech, and due to this, the

listening experience will be described in words, and the sound quality for the methods

will be compared to each other. In the frequency shifting method, it should first of

all be concluded that a 5 Hz frequency shift on ordinary speech did not affect the

(43)

0 5 10 15 20 25 30 35 40

−6

−5

−4

−3

−2

−1 0 1

Time [s]

Filter misadjustmen t [dB]

Figure 6.8 – The filter misadjustment for the NLMS filter

sound quality in a way that was not notable to me. Since the method does not prevent howling to arise, ringing sounds were heard at gain levels above the initial MSG level.

The howling that occurred was then frequency shifted each loop, resulting in a brief sweeping sound for each howling frequency being up-shifted. As the gain level increased, more howling frequencies were heard as brief up-shifted sweeps, making the total sound quality unacceptable for live applications. The system did not, however, show divergent behaviour, even at the highest applied gain levels.

The notch filter method was, as expected, reactive, meaning that howling was heard before the frequencies were suppressed. The level of the howling did not reach disturbing levels before they were suppressed, however, making the listening experience decent throughout the simulations. When a small number (0-5) of notch filter were active, there were no audible artefacts, but the more notch filters that were activated, the more the total sound quality was affected. By the time that close to all, or all notch filters (15-20) were active, the sound was notably distorted, but the listening experience was still deemed acceptable, especially compared to the frequency shifting method.

The AFC method resulted in the best listening experience, with no or very few disturbing

audible artefacts. The dip observed at 28 seconds in fig. 6.6 was not heard. There were

at times small echoes and noises in the background, which are assumed to be related to

an erroneously adapted filter. These small artefacts were not deemed disturbing, and

might blend in with the echo and reverberation that is present in all live PA scenarios.