Acoustic Signatures

(1)

Acoustic Signatures

Niclas Sidekrans

Luleå University of Technology MSc Programmes in Engineering

Media Technology

Department of Computer Science and Electrical Engineering Division of Signal Processing

2007:247 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--07/247--SE

(2)

ACOUSTIC SIGNATURES

Niclas Sidekrans

MASTER OF SCIENCE PROGRAMME Department of Signal Processing

Luleå University of Technology

(3)

ACOUSTIC SIGNATURES

Niclas Sidekrans

April 27, 2007

(4)

In the past decades the manpower in Vattenfall AB’s hydro power facilities has decreased, and with that also the ability of the human hearing to detect irregular sounds in the power plant. There is a need to investigate whether acoustic signals carry information that can be extracted by a sound monitoring system and how such a system could be constructed.

The methods used in this Master Thesis are based on time-frequency analysis and pattern recognition theory. Several algorithms for extracting spectral features were implemented in MATLAB. One method for supervised pattern recognition, the k- nearest neighbor classifier (k-nn) and two methods for unsupervised pattern recognition, the k-means clustering algorithm and the Density based scan algorithm (dbscan) were implemented in MATLAB for classification and pattern recognition of the calculated spectral features.

Acoustic signals from operating hydro power machinery was recorded at slightly different operating conditions and the implemented methods were applied to the recorded signals. The result showed that the time-frequency analysis and pattern recognition theory could be used to extract information regarding the state of operating hydro power machinery. The most discriminating spectral feature was the Renyi Entropy number. The k-nn classified the different recordings with a 0.9929 success rate.

(5)

anläggningens maskiner.

Bemanningen vid Vattenfall AB:s vattenkraftverk har de senaste årtionden minskat drastiskt med en direkt följd att de tidigare mänskliga tillståndskontrollerna av okända ljud från anläggningens maskiner till stor del försvunnit. Det finns ett behov att undersöka om akustiska signaler innehåller information om ljudkällans tillstånd och om denna information kan utvinnas med hjälp av ett tekniskt system samt hur ett sådant system kan tänkas se ut.

Metoderna som används i detta examensarbete är baserade på tid-frekvens-analys och mönsterigenkänningsteori. Flera algoritmer för utvinning av spektrala egenskaper har implementerats i MATLAB. Tre metoder för klassificering och gruppering av datamängder implementerades också, k-nearest neighbor (k-nn), k-means clustering samt Density based scan clustering.

Akustiska signaler från opererande vattenkraftsmaskindelar spelades in vid smått varierande tillstånd. Metoderna för egenskapsutvinning och mönsterigenkänning användas för att gruppera de olika ljuden. Resultaten visar att tid-frekvens-analys och mönsterigenkänningsteori kan användas för att extrahera information om vattenkraftsmaskinernas brukstillstånd. Den mest urskiljande spektrala egenskapen var Renyis entropital. Klassificering med k-nn klassindelade de olika ljuden med en kvot på 0,9929 i antal lyckosamma försök.

(6)

Master of Science Programme at Luleå University of Technology.

My thanks go to all the helpful people over at Vattenfall Research & Development and especially to the other thesis writers located at Vattenfall Research & Development, Johannes Lärkner and Björn Thorsélius from Uppsala University, for all good times during lunches and coffee breaks. Finally I would like to thank Louise for moving across the country for my sake and supporting me during this time.

(7)

1 BACKGROUND 1 1.1 Vattenfall Research & Development 1

1.2 Monitoring systems of today 1

1.2.1 Conwide 2

1.2.2 Sound monitoring 2

1.3 Purpose 3

1.4 Objectives 3

1.5 Limitations 4

2 INTRODUCTION 5

2.1 Pattern Recognition 5

2.1.1 Acoustic Signatures 7

2.1.2 Classifying sound signals from hydro power machinery 7

2.2 Signal processing tools 8

2.2.1 MATLAB 8

3 THEORY 9

3.1 Mathematical Transforms 9

3.1.1 The Fourier transform 9

3.1.2 The short-time Fourier transform 11

3.1.3 The spectrogram 11

3.1.4 Window functions 12

3.2 A/D conversion 16

3.2.1 Sampling theorem 17

3.3 Feature Extraction 20

3.3.1 Spectral features 20

3.4 Classifier models 28

3.4.1 k-nearest neighbor models 28

3.5 Clustering algorithms and its application in acoustic classification 29

3.5.1 k-means clustering 29

3.5.2 Density based scan clustering 31

4 METHOD 34

4.1 Problem description 34

4.2 Data acquisition equipment 34

4.2.1 Acoustic recording setup 35

4.2.2 Acoustic recording conditions 36

4.3 Feature extraction 40

(8)

6 CONCLUSIONS 47

6.1 Recording 47

6.3 Classification 47

6.3.1 Supervised classification 47

6.3.2 Unsupervised classification 48

7 DISCUSSION AND FUTURE WORK 49

7.1 Recording 49

7.2.1 Post processing 50

7.2.2 Other features for sound classification 50

7.3 Classification 51

7.3.1 Supervised classification 52

7.3.2 Unsupervised classification 52

7.4 Further analysis of spectral features 53 7.5 Acoustic signals vs. vibration signals 53 7.6 Pattern recognition applied to other measurements 53

8 BIBLIOGRAPHY 54

APPENDIX 1: SUMMARY OF MATLAB FUNCTIONS A

APPENDIX 2: MATLAB CODE B

APPENDIX 3: MICROPHONE DATA SHEET M APPENDIX 4: PRE-AMPLIFIER DATA SHEET O APPENDIX 5: USB MODULE DATA SHEET Q

APPENDIX 6: POWER MODULE W

(9)

Page 1 (56)

1 Background

1.1 Vattenfall Research & Development

Vattenfall Research & Development AB is Vattenfall’s resource for developing the energy solutions for the future as well as providing services and solutions for both Vattenfall and other companies.

Vattenfall Research & Development is situated in Älvkarleby and Råcksta and has approximately 150 highly educated employees. The research and development is focused on increasing the productivity of Vattenfall’s energy plants in Germany, Poland and the Nordic countries, to develop new product and services and to provide better service given to the clients of Vattenfall [7].

Within Vattenfall Research & Development there are several students doing their thesis work in a wide variety of directions. The thesis projects often develop into pilot projects that can be expanded to new products, services and solutions for the future.

1.2 Monitoring systems of today

A hydro power plant is operating under severe conditions due to the need of producing massive electric power. The monitoring and diagnosing of the hydro power facilities is very important work and human errors during such work would yield severe problems both to personnel safety and reliability in the hydro power production.

State control and maintenance planning are in large based on automatic monitoring systems monitored from control rooms far distanced from the facility in question.

Patrols and inspection of the hydro power plant, so called rounding, is also an important aspect in the state and maintenance work. In a rounding procedure, measurements such as temperature, pressure and time of operation are noted. The purpose of the data acquisition is to monitor the condition of the facility and to gather information for future analysis.

The rounding procedure and intervals have changed greatly over the decades. In the beginning of the century the control readings and rounds in the hydro power stations was a main part of the daily work. Practically every available meter was read every hour and rounds were taken several times per day. The follow-up and post analyzing of these controls was also very extensive.

In the 1950’s, 60’s and 70’s most of the hydro power stations were still controlled from local control rooms that were manned around the clock. In these times readings

(10)

Page 2 (56)

of the control meters were done every hour and rounds in the power station took place daily.

When the automation started in the 1970’s most of the former localized control rooms were replaced with larger distance operated centrals. As a natural consequence the personnel in the hydro power stations decreased. The rounds were reduced to around 3 times a week at most of the hydro power stations.

When the computer aided era of the rounds started in the end of the 1980’s a re- evaluation of the rounds also occurred. It was now common to change the routine of the rounds and also to have round intervals taking place less frequent than weekly.

Extensive reorganization took place in the 1990’s, which led to a further decrease in manpower. Nowadays the rounding protocols are often limited to a couple of hundred points per hydro power unit and a normal rounding interval is once a week [3].

1.2.1 Conwide

Conwide system III is a computer system developed by Conwide AB for Vattenfall Hydropower AB. The system is developed to facilitate the data acquisition and data analysis from rounding procedures and inspections of equipment in the hydro power facilities. A personal digital assistant (PDA) with rounding protocol is used during rounds to collect current data. The collected data is transferred to a central database via a stationary personal computer (PC) [9].

1.2.2 Sound monitoring

Monitoring from a control room often relies on measurements from different sensors, such as temperature, pressure and vibrations. From a control room it’s difficult to detect irregular sounds from hydro power plant equipment, the maintenance operators have to patrol the facility to detect them. To compensate for the decreased rounding intervals sound monitoring is an area worth investigating. Disadvantages of sound monitoring are that sound signals can rarely be a substitute for a person in place in the facility. The absence of visual images accompanying the audio signals is also a disadvantage in favor of site presence. For a more extensive investigation of a solution for distance oriented state control, see [24].

Sound can be a very efficient signal for fault detection. The human hearing can often detect irregular sounds that imply a deviation from the normal condition in, for example, a hydro power facility. The detection of faults in machine equipment by human hearing often requires long time experience of being around the sound source in question plus substantial information about the mechanical behavior of the machine part. A problem is that to learn what to listen for is a subjective science impossible to learn except by long time experience.

(11)

Page 3 (56)

Sound signals can carry a lot of information that is hidden to the human perception of the sound. A human’s frequency range is normally 20 Hz to 20 kHz, more often narrower. Psycho acoustic properties such as masking where some sounds vanish in the presence of louder sounds, also contribute to the limitations of the human hearing.

Signal processing techniques, such as different kinds of spectral analysis can extract information from a sound signal that is impossible for a human to detect.

A problem with sound signals is that they must be processed and analyzed further than other measurements used for state condition, such as temperature and water levels.

Whereas temperature sensors gives immediate information from the gathered samples on a display that can be used for determining the state of machinery a sound signal cannot give instant information if its viewed in the same way in the time domain. To use sound signals for state conditioning by a computer the data must be post processed.

There is a need to establish an objective and science grounded approach towards sound monitoring for fault detection. There is information in the acoustic profiles of machine sounds that is so far not addressed in the monitoring of operating hydro power machinery.

1.3 Purpose

The aim of this Master Thesis project is to investigate whether the sound signals from operating hydro power machinery carry information that can be extracted and used as a complementary to the already existing monitoring systems used to detect faults and, in part, for maintenance planning in the hydro power facilities.

1.4 Objectives

The objectives of this master thesis are to

Get an understanding of the hydro power unit monitoring system and techniques in machine monitoring used today.

Review current techniques in sound monitoring systems by a literature study.

Construct a MATLAB algorithm, which is able to determine the hydro power unit status from an observed sound signal of an isolated hydro power unit sound.

Evaluate if and where a sound monitoring system is effective and the most cost effective implementation of such a system.

(12)

Page 4 (56)

1.5 Limitations

The aim of this thesis is not to provide a finished solution for a sound monitoring system but merely to give an introduction to the field of time-frequency analysis and pattern recognition theory applied to the monitoring of operating hydro power machinery.

(13)

Page 5 (56)

2 Introduction

The analysis of sound signals for detecting irregular events in the monitoring of operating machines has not yet been widely used in the industry so it is not quite clear how to approach the problem [6]. Time frequency analysis applied to problems for detecting faulty bearing conditions has though been investigated previously [14] [20].

The extraction of acoustic properties from acoustic signals is also an area that has been widely investigated [1] [5] [13] [17] [19] [22]. There are many areas of application with the most famous being the field of speech recognition. The theories and methods used in speech recognition are based on pattern recognition theory, which will be given a short introduction below.

2.1 Pattern Recognition

The goal of pattern recognition is to classify objects into a number of categories. The word pattern refers to the type of measurements that needs to be categorized, or classified. The measurements can be just about anything but typical examples are images and acoustic signals [1].

When measurements will be classified into categories, the first step is to identify measurable quantities, or features, that can distinct the measurements from each other [1]. For example, suppose there is a need to distinguish between data consisting of a large set of two characters, say l and a for example, of somewhat different sizes. Then two features could be the length and the width of the characters. These two features plotted in a two-dimensional plane would separate these two classes making it easy to distinguish between them, see Figure 2.1.

(14)

Page 6 (56)

1 1.5 2 2.5 3 3.5 4

2.5 3 3.5 4 4.5 5 5.5 6

Width (mm)

Height (mm)

character: l character: a

Figure 2.1: 2-dimensional feature extraction between two characters.

As said, a feature represents one characteristic. In the example above, two features were used. In most cases M features xi, i = (1, 2, …, M) are used and form a feature vector [1]

[

x x x_M

]

x = ₁, ₂, K

Each feature vector describes uniquely a single object. When N objects are analyzed the corresponding feature vector for each observation are put in a feature matrix

The feature matrix consists of feature vectors column wise and observations row wise.

The line separating the two classes in Figure 2.1 is called a decision line and constitutes the classifier in the above example. An observation that falls in the character a region is also classified to that class and vice versa. To draw a straight line between classes, and thus constructing a classifier, there must be knowledge of the class labels for each point in Figure 2.1. The patterns, or feature vectors, for which the

(15)

Page 7 (56)

classes are known and are used to construct the classifier, are called the training data (TR), or training feature vectors. Unlabeled data that will be categorized with the classifier consists of feature vectors representing the same unique characteristics as the training feature vectors and are called testing data (TE), or testing feature vectors [1].

Pattern recognition can be divided in two categories, supervised and unsupervised pattern recognition. In supervised pattern recognition, the training data with class labels are available. For classification a set of testing data is compared to a set of training data with known class labels. The goal is to find similarities between the feature vectors. The class of the training feature vectors that is most similar to the testing feature vectors is denoted to the testing data, thus a classification has occurred.

In the other case, unsupervised pattern recognition, there is a collection of feature vectors available but with no information of class labels. In this case the goal is to find underlying similarities and group similar feature vectors together [1].

2.1.1 Acoustic Signatures

In this thesis Acoustic Signatures are meant to be a unique description of an acoustic signal, much like a fingerprint is the unique description of a finger. In this case, it is not the friction ridges that are the distinctive feature as in the case of a fingerprint, instead there are acoustic features based on different properties of the signal. These features form acoustic feature vectors, and observations of these vectors form the acoustic feature matrix, in this thesis called the Acoustic Signature.

There are several ways to derive uniqueness from a specific acoustic signal. There is not an exact science to decide which acoustic features that are suitable for a classification task. To decide if an acoustic feature is a good one, there must often be some kind of testing to see if the feature in question is suitable for the task. A good calculated feature should distinct different classes from each other and be as consistent as possible within each class [1]. The goal of this extraction stage is to create features that characterize the signals making it easy to distinguish between them.

2.1.2 Classifying sound signals from hydro power machinery

The classification problem in the monitoring of hydro power equipment will mainly be the task of classifying the sound signal into two categories, a normal condition and a non normal condition. The sound signal from a normally operating machine can easily be recorded and thus an acoustic signature for a normal condition can be constructed.

To record a sound signal of a non-normal condition is not effortless since hydro power equipment rarely malfunctions. There must be some kind of non-normal event occurring to record a sound signal to be used to construct the non-normal acoustic signature.

(16)

Page 8 (56)

If the approach was to construct an acoustic signature for the normal condition and attempt to classify all irregular events as non-normal, the search for a non-normal class wouldn’t be necessary. The problem with this approach would be that the decision boundary between the normal condition class and all other possible irregular condition classes would be impossible to determine with any scientific methods. The most natural approach in such a case would be to use a confidence interval in the frequency distribution of the normal condition. It’s not likely that such a system would be an efficient one and the evaluation of such a system would also be impossible due to the lack of the actual non-normal condition classes. Therefore, the effort of getting recordings of non-normal conditions must be done to get an as efficient sound monitoring algorithm as possible.

2.2 Signal processing tools

2.2.1 MATLAB

MATLAB, or Matrix Laboratory, is a high level programming language, an interactive environment for technical computing problems and the environment chosen for the problem solving in this thesis. In MATLAB, computing problems can be solved faster than with traditional programming languages such as C/C++. MATLAB has many problem solving areas, such as economics, signal and image processing, mathematics, optimization and statistics for example [10].

(17)

Page 9 (56)

3 Theory

3.1 Mathematical Transforms

Mathematical transforms are used in engineering systems to obtain information that was not attainable in the original signal. There are many types of mathematical transforms and the most famous one is the Fourier transform that will be described below.

For better illustrative examples a test signal s

(t )

is constructed as

) t 2sin(

) 1 t t t ( s

0 ) t t t ( s

) , 1 , , t ( chirp )

t t t ( s

0 ) t t t ( s

) t sin(

) t t 0 ( s

2 5

4

4 3

2 1 3

2

2 1

1 1

ω ω ω ω

=

≤

=

≤

=

≤

=

≤

=

≤

3.1.1 The Fourier transform

The Fourier transform is used to retrieve the frequency content of a signal in the time domain and is defined by

∫

∞

−

=

g t e− dt f

G

( ) ( )

^j²^π^ft

∫

∞

−

=

G f e df t

g

( ) ( )

^j²^π^ft

Equation (3.1) is called the forward Fourier transform and equation (3.2) the inverse Fourier transform [2]. The idea is to multiply a signal with an exponential signal at some frequency, f, and then integrate the result over all time. The exponential part in equation (3.1) and (3.2) can be written with a real and an imaginary part consisting of sine and cosine functions respectively with frequency f. If the integration of the product yields a relatively large value then the signal g(t) contains a dominant spectral component at frequency f. If the integration of the product yields a relatively small value the signal g(t) has no major component at frequency f. If the result of the integration is zero the signal g(t) contains no component of frequency f at all. To see the Fourier transform of the test signal s

(t )

see Figure 3.1.

) 1 . 3 (

)

2 .

3 (

(18)

Page 10 (56)

Time domain representation of test signal

Time

s(t)

Frequency domain representation of test signal

Frequency (Hz)

log(|S(f)|)

Figure 3.1: Frequency domain and time domain representation of test signal s(t).

A similar form of the Fourier transform is the discrete Fourier transform (DFT) that applies to sampled time functions, which is a sequence of values at discrete, equally spaced, points in time [2]. The DFT enables the computation of the Fourier transform on a computer. The DFT is defined by

∑

^∞

−∞

=

− n

ft j

n e ⁿ

t g f

G

( ) ( )

²^π

−

∫

=

2 /

)

2

1 ( ) (

s

n

f

ft j s

n G f e df

t f

g ^π

In equation (3.3) and (3.4) t_n is the time corresponding to the n th time sample.

In equation (3.1) and (3.2) the integration is performed over all times, from minus infinity to infinity. This means that no matter where in time a frequency component in g(t) is located it will affect the result of the transformation equally. In other words, the Fourier transform yields no information of the time localization of spectral components, it informs however whether a frequency component exists in the signal or not. For this reason the Fourier transform is not suited to investigate signals with frequency contents that vary over time, i.e. non-stationary signals. To clarify, in this

) 3 . 3 (

)

4 .

3 (

(19)

Page 11 (56)

thesis the term stationary should not be mistaken with any statistical features, the term is only used to describe the spectrum of signals.

3.1.2 The short-time Fourier transform The short-time Fourier transform (STFT) is defined by

∫

∞

−

=

g t w t e dt f

STFT

( τ , ) ( ) ( τ )

^j²^π^ft

The STFT is a solution to the problem of describing local changes in the frequency content of a signal found in the Fourier transform. The similarity with the Fourier transform is striking. The only difference between the two is that the signal g(t) in the STFT is multiplied with a sliding window function, w(t). Even though a signal may not be completely stationary, i.e. having the frequency components constant over all times, it’s often likely that it’s stationary at small portions of time. The window function, w(t), of equation (3.5) is multiplied with the signal to get chunks of the signal where it’s stationary. Compare it with dividing the function in small pieces where each piece has constant frequency components and then doing the Fourier transform of each piece. The result is a Fourier transform with time resolution as well as frequency resolution, thus correcting the main drawback of non-time resolution with the standard Fourier transform.

3.1.3 The spectrogram

The spectrogram, P_SP(t,f), is defined as

|

2

) , (

| ) ,

(

t f STFT t f P_SP

=

The spectrogram is a joint time-frequency two-dimensional plot. It’s used to analyze non-stationary signals when there’s a need for information of the time localization of frequency components. The spectrogram uses the STFT to take the Fourier transform of small enough time segments of the signal. The length of the time segments is chosen so that the signal hopefully is stationary within each frame. The spectrum for each time segment is then absolute value squared to get the energy distribution of the signal called the spectrogram. The result is viewed as a two-dimensional plot as Figure 3.2 illustrates.

) 5 . 3 (

)

6 .

3 (

(20)

Page 12 (56) Time (s)

Frequency (Hz)

Spectrogram for test signal s(t) N:256|over:90%|window:Hamming(128)

0.5 1 1.5 2 2.5 3 3.5

0 100 200 300 400 500 600

-300 -250 -200 -150 -100 -50 0

Figure 3.2: Spectrogram for test signal s(t).

A signal is often zero padded before the calculation of the DFT since this will result in a finer interpolation of the spectrum than if the DFT would be computed on the original signal. In all examples the signal is zero padded to twice the window length to increase the display of the spectrum.

3.1.4 Window functions

Window functions are zero outside a certain time interval. Window functions are multiplied with the signal to get time segments. When a signal is multiplied with a window function it will also be zero valued outside that same time interval. The window shape modifies the spectrum of the signal. Three window properties modify the spectrum, the shape, the length of the window function and the overlap between consecutive time segments.

One problem using the STFT that is connected with the window length is the simultaneous time and frequency resolution. The STFT corrects some of the drawbacks with the non-time localization of frequency components in the Fourier transform but using the STFT will always be a compromise between good time and good frequency resolution. When each segment of the divided signal undergoes the Fourier transform, it assumes the signal to be stationary in that time interval. To be sure that the signal is stationary in each time segment an idea might be to shorten the window length, w(t), substantially. Unfortunately, this idea does not work. The

(21)

Page 13 (56)

introduced window modifies the original signal spectrum. If the window length is shortened considerably the signal will be scrambled and the frequency resolution will get worse. On the other hand if good frequency resolution is desirable and the window length is large, frequency resolution will indeed be good but the time resolution will get worse and the signal is less likely to be stationary within each time segment, see Figure 3.3.

Time (s)

Frequency (Hz)

0 1 2 3 4

0 100 200 300 400 500 600

Time (s)

Frequency (Hz)

0 1 2 3 4

0 100 200 300 400 500 600

Figure 3.3: Time resolution vs. frequency resolution for test signal s(t)

In Figure 3.4 two different window functions are shown, the Hamming window and the Rectangular window function. The shapes of these two functions are very different and they alter the spectrum in different ways. The values at the beginning and the end of the Hamming window will be reduced while they will remain constant with the rectangular window.

(22)

Page 14 (56)

Window functions in time domain

w(t)

time

Hamming Rectangular

Window functions in frequency domain

|W(f)|

frequency

Hamming Rectangular

Figure 3.4: Hamming and Rectangular window functions in the time domain.

The difference between these two windows is best illustrated in the frequency domain.

It’s an unavoidable fact that using only a small portion of the signal, i.e. using window multiplication before transformation, gives a broadening of the peak and introduces side lobes. The Rectangular window has a very narrow main lobe and higher side lobes than the Hamming window. As a general rule, low side lobes force a wide main lobe [4].

The rectangular window function with its narrow main lobe will give great frequency resolution but due to its high side lobes it will introduce frequency leakage. The Hamming window on the other hand will smear the spectrum more but will reduce the frequency leakage, which is most often more desirable for most signals [4], see Figure 3.5 for example.

(23)

Frequency (Hz)

0 1 2 3

0 100 200 300 400 500 600

Time (s)

Frequency (Hz)

Spectrogram for test signal s(t) N:256|over:0%|window:Rectangular(128)

0 1 2 3

0 100 200 300 400 500 600

Figure 3.5: Spectrogram plot for test signal with Hamming and Rectangular window.

Often an overlap between the time segments is used to increase the time resolution at the cost of computation time. Another important aspect of using a window overlap is to increase the robustness to the case when the analyzed signal is not perfectly aligned, so called shifting [17], see Figure 3.6.

(24)

Frequency (Hz)

1 2 3

0 100 200 300 400 500 600

Time (s)

Frequency (Hz)

0 1 2 3

0 100 200 300 400 500 600

Figure 3.6: Spectrogram plot with and without overlap.

In summary, the choice of window functions usually falls on functions like the Hamming due to its overall good performance. The window length must be adjusted depending on how stationary the analyzed signal is, a typical window length correspond to 10-500 ms. Zero padding increases the display of the spectrum and is usually twice the window length.

3.2 A/D conversion

All signals we encounter outside a computer are so called continuous time signals and the resulting sampled versions are called discrete time signals. The block scheme representation of the stages between a continuous time signal xa(t) to a discrete time signal x[n] is illustrated in Figure 3.7 and consists of an anti aliasing filter, a sample and hold (S/H) circuit and an analog to digital (A/D) converter [11].

Figure 3.7: Block scheme for the sampling process.

The Anti aliasing filter is used to prevent a damaging effect called aliasing. The S/H circuit is needed due to the time delay the A/D converter introduces. The S/H circuit samples the input, the continuous time signal, at periodic intervals and holds the sampled analog value constant at its output for sufficient time so the A/D has time to work [11].

Anti aliasing filter

Sample &

hold

)

A/D

(t

x_a x

[n ]

(25)

Page 17 (56)

There are a few ground rules that must be followed in the sampling process. These rules are defined by the sampling theorem.

3.2.1 Sampling theorem

The sampling process can be described as taking uniformly spaced points of the continuous signal at time intervals t =nT, where Tdenotes the sampling period and is related to the sampling frequency F_Tby T =1/F_T, see Figure 3.8 for example.

T 3T 5T 7T 9T 11T 13T 15T 17T 19T

-1 0

1 Sampling process example

Time

Amplitude

Figure 3.8: Sampling process example.

The continuous time signal and the discrete time signal are related by

]

[ ) (

nT g n g_a

=

To illustrate the phenomenon of aliasing it’s helpful to look at the effects of sampling in the frequency domain. Let now g_a

(t )

be a band limited signal with corresponding Fourier transform G_a

(

j

Ω )

, see Figure 3.9a, whose highest component is denoted

Ω

m.

Ω

is the angular frequency related to the frequencyFbyΩ=2

π

F. The impulse train p(t)is defined as

∑

^∞

−∞

=

−

=

n

nT t t

p( )

δ

( )

) (t

g_a

) 7 . 3 (

)

8 .

3 (

(26)

Page 18 (56)

where

δ (t )

is recognized as the ideal pulse function, the dirac delta function. p

(t )

has a Fourier transform P(jΩ), see Figure 3.9b, that is periodic with Ω_T =2

π

/T. Multiplying the signal g_a

(t )

with the impulse train p(t) produces the sampled continuous time signal g_p

(t )

∑

^∞

−∞

=

−

=

n a

a

p t g t p t g nT t nT

g

( ) ( ) ( ) ( ) δ ( )

that has a Fourier transform G_p(jΩ). Now take a look at the Nyquist criterion that is defined as

m T

> Ω

Ω 2

If the sampling frequency F_T obeys under the Nyquist criterion, i.e. if the sampling rate is fast enough, the corresponding Fourier transform G_p(jΩ) of the sampled signal will be as Figure 3.9c illustrates. If however the Fourier transform of the impulse train p(t) is periodic with

Ω

_T

< 2 Ω

_m, see Figure 3.9d, the resulting sampled signal will have a Fourier transform illustrated by Figure 3.9e. The overlapping frequency components seen are the phenomenon called aliasing [11].

) 10 . 3 (

)

9 .

3 (

(27)

Page 19 (56)

Figure 3.9: Frequency domain effects of time domain sampling. (a) The spectrum of

)

(t

g_a , (b) spectrum of impulse train p(t), (c) spectrum of the sampled signal g_p

(t )

obeying the Nyquist criterion, (d) spectrum of impulse train p(t) with a smaller sampling period than shown in (b), (e) spectrum of sampled signal g_p

(t )

with

m T

< Ω

Ω 2

.

(28)

Page 20 (56)

The original spectrum, Figure 3.9a, can be recovered from the sampled signal in Figure 3.9c by the simple use of a low pass filter with cut off frequency

) ,

(

_m _T _m

c

∈ Ω Ω − Ω

Ω

. Trying to reconstruct the sampled signal in Figure 3.9e by the same method will yield an alias distorted version of the original signal [11].

The anti-aliasing filter is needed since most continuous time signals have a greater bandwidth than the bandwidth of the discrete time processors, i.e. the processors cannot sample infinitely fast. The anti aliasing filter accomplishes its task by making sure that the incoming signal is band limited so that the Nyquist criterion is obeyed. It does this by a simple low pass filtering, i.e. removing all frequency components that are higher than half the sampling frequency [11].

3.3 Feature Extraction

3.3.1 Spectral features

Spectral features are different types of frequency characteristics of the analyzed signal that characterize the spectral shape. The spectral features are instantaneous, i.e.

computed on time segments of the signal, which means they can vary throughout the signal.

Let x_i

(n )

, n

= ( 0 , 1 ,...,

N

− 1 )

be the samples of the i^th frame and F_i

(m )

,

)

1 ,..., 1 , 0

( −

=

N

m be the corresponding DFT coefficients. To make the algorithms more computational effective in MATLAB the DFT coefficients are derived with use of MATLAB’s built in specgram.m function, the spectrogram.m function could also have been used. To better illustrate the spectral features below the test signal s

(t )

used in section 3.1 will be analyzed.

(29)

Page 21 (56)

3.3.1.1 Spectral Centroid

The Spectral Centroid (SC) is the “centre of gravity” of the magnitude spectrum of the STFT [5]. The SC measures the “brightness” of the spectrum. High values of the Centroid correspond to “brighter” acoustic structures with more energy in the high frequencies. See Figure 3.11 for example. SC is defined as

∑ ∑

−

=

−

= = ₁

0

2 1

0

2

| ) (

|

| ) ( ) |

( _N

m i

N

m i

m F

m F i m

SC

0 500 1000 1500 2000 2500 3000 3500 4000 4500 -1

0

1 Time domain representation of test signal

Time (n)

s[n]

Time (s)

Frequency (Hz) Time-frequency representation of test signal

0.5 1 1.5 2 2.5 3 3.5

0 200 400 600

0 10 20 30 40 50 60 70 80

0 1 2

Spectral Centroid | N=1024 | overlap=90

Window number

Spectral Centroid

Figure 3.11: Spectral Centroid computed on test signal s(t). The Spectral Centroid follows the dominating spectral component in the test signal s(t) much like the Spectrogram since the spectral components in the test signal consist of single frequency components.

)

11 .

3 (

(30)

Page 22 (56)

3.3.1.2 Spectral Bandwidth

The Spectral Bandwidth (SB) is computed as the magnitude weighted average of the distance between the SC and the spectral components [5]. See Figure 3.12 for example. The SB is defined as

∑ ∑

−

=

−

= −

= ₁

0

2 1

0

2 2

| ) (

|

| ) (

| )) ( ) (

( _N

m i

N

m i

m F

m F i SC i m

SB

0 500 1000 1500 2000 2500 3000 3500 4000 4500 -1

0

Time (n)

s[n]

Time (s)

0.5 1 1.5 2 2.5 3 3.5

0 200 400 600

0 10 20 30 40 50 60 70 80

0 1 2

Spectral Bandwidth | N=1024 | overlap=90

Window number

Bandwidth

Figure 3.12: Spectral Bandwidth computed on test signal s(t). The Spectral

Bandwidth follows the Spectral Centroid quite close since the spectral components in the test signal consist of single frequency components.

)

12 .

3 (

(31)

Page 23 (56)

3.3.1.3 Spectral Band Energy

Let x

(n )

, n

= ( 0 , 1 ,...,

M

− 1 )

be the samples from the complete signal to be analyzed and F_total

(m )

, m

= ( 0 , 1 ,...,

M

− 1 )

the DFT coefficients. The Spectral Band Energy (SBE) is the band energy normalized by the energy in the total spectrum [5]. See Figure 3.13 for example. SBE is defined as

∑ ∑

−

=

−

=

=₁ 0

2 1

0

2

| ) (

|

| ) ( ) |

(

_M

m total

N

m i

m F

m i F

SBE

0 500 1000 1500 2000 2500 3000 3500 4000 4500 -1

0

Time (n)

s[n]

Time (s)

0.5 1 1.5 2 2.5 3 3.5

0 200 400 600

0 10 20 30 40 50 60 70 80

0 1 2

Spectral Band Energy| N=1024 | overlap=90

Window number

Band Energy

Figure 3.13: Spectral Band Energy computed on test signal s(t). The energy is higher for the chirp signal and lower in the sinusoids. The SBE calculations discriminates signals with different energy content.

)

13 .

3 (

(32)

Page 24 (56)

3.3.1.4 Spectral Flatness Measure

The spectral flatness measure is used to distinguish between tone like and noise like signals. It’s calculated by the ratio of the geometric mean value to the arithmetic mean value of the energy spectrum value [5]. See Figure 3.14 for example. The definition of the SFM is

[ ]

∑

∏

−

=

−

= =

1 0

2 1 1

0

2

| ) ( 1 |

| ) ( ) |

( N

m N N m

m N F

m i F

SFM

0 500 1000 1500 2000 2500 3000 3500 4000 4500 -1

0

Time (n)

s[n]

Time (s)

0.5 1 1.5 2 2.5 3 3.5

0 200 400 600

0 10 20 30 40 50 60 70 80

0 2

4 Spectral Flatness Measure | N=1024 | overlap=90

Window number

Flatness

Figure 3.14: Spectral Flatness Measure computed on test signal s(t). A signal with single tone components results in a SFM close to zero.

)

14 .

3 (

(33)

Page 25 (56)

3.3.1.5 Spectral Crest Factor

The spectral crest factor (SCF) is also a measure related to the flatness of the spectrum. The SCF is computed by the ratio of the maximum value within the band to the arithmetic mean of the energy spectrum value [5]. See Figure 3.15 for example.

The SCF is defined as

( )

∑

=⁻

=

1

0

2 2

| ) ( 1 |

| ) (

| ) max

(

N

m i

i

m N F

m i F

SCF

0 500 1000 1500 2000 2500 3000 3500 4000 4500 -1

0

Time (n)

s[n]

Time (s)

0.5 1 1.5 2 2.5 3 3.5

0 200 400 600

0 10 20 30 40 50 60 70 80

0 1 2

Spectral Crest Factor| N=1024 | overlap=90

Window number

Crest

Figure 3.15: Spectral Crest Factor computed on test signal s(t). A feature also good at discriminating signals with flat spectrum from signals with more varying spectrums.

)

15 .

3 (

(34)

Page 26 (56)

) 16 . 3 (

3.3.1.6 Shannon Entropy

The Shannon Entropy (SE) is a measurement of the spectral distribution [5]. See Figure 3.16 for example. SE is defined as

∑

=⁻

= ¹₀| ( )log₂| ( )| )

(i ^N_m F_i m F_i m SE

0 500 1000 1500 2000 2500 3000 3500 4000 4500 -1

0

Time (n)

s[n]

Time (s)

0.5 1 1.5 2 2.5 3 3.5

0 200 400 600

0 10 20 30 40 50 60 70 80

0 1 2

Shannon Entropy| N=1024 | overlap=90

Window number

Shannon

Figure 3.16: Shannon Entropy computed on test signal s(t). The spectral distributions for sinusoids and chirps signals are quite different as seen in the Figure.

(35)

Page 27 (56)

3.3.1.7 Renyi Entropy

The Renyi Entropy (RE) is, like the SE, a measurement of the spectral distribution [5].

See Figure 3.17 for example. RE is defined as

 

 





= − ∑

⁻

= 1

0

| ) (

| 1 log

) 1

(

^N

m

r i m r F

i RE

In equation (10), r constitutes the RE order.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 -1

0

Time (n)

s[n]

Time (s)

0.5 1 1.5 2 2.5 3 3.5

0 200 400 600

0 10 20 30 40 50 60 70 80

-4 -3.5

Renyi Entropy| r=2 | N=1024 | overlap=90

Window number

Renyi

Figure 3.17: Renyi Entropy computed on test signal s(t). RE is also a feature for discriminating between peakiness-flatness of signals.

)

17 .

3 (

(36)

Page 28 (56)

3.4 Classifier models

3.4.1 k-nearest neighbor models

One of the simplest algorithms for classification is the k-nearest neighbor algorithm (k-NN), yet it’s very powerful. The k-NN is a supervised learning algorithm. The main idea of the k-NN is to classify data depending on its similarity with pre-defined data.

The classifier measures the similarity between an unclassified data object and the pre- defined data by different distance measures. The algorithm computes the distance between the unclassified data object and the k closest objects in the pre-defined training data set. The class majority of the k nearest neighbors will be the decided class for the unclassified data object. Figure 3.18 shows an example of a k-NN classification run with 2-dimensional training data with 3 neighbors.

k-nn classification example

Feature 1

Feature 2

training data class 1 training data class 2 unclassified object k-nearest neigbors

Figure 3.18: k-nn classification with 2-dimensional training set and k = 3.

The distance measure used in k-NN algorithms is often the Euclidean distance measure. The Euclidean distance between two vectors pand qof dimension N is defined as

∑

=

−

=

_i^N p_i q_i

d ₁

( )

²

( 3 . 18 )

(37)

Page 29 (56)

As mentioned in section 2.1 the set of predefined data with known class labels that are used to train the k-nn classifier algorithm is called training data (TR). The unclassified objects that are to be compared to the TR are called testing data (TE). The size and the quality of the TR are crucial for the classifier performance. When testing the classifier, the members of TE cannot be members of the data set TR since that would produce untrue classification results for obvious reasons.

Optimal number of neighbors to use in a k-nn classifier depends on the TR. Often just one neighbor is sufficient for good performance but to find a more optimal value a cross validation is preferred.

The evaluation of the k-nn classifier is simple and the success rate is measured as

TE in objects total

of number

TE from objects classified

correctly of

number rate

success =

3.5 Clustering algorithms and its application in acoustic classification

Clustering algorithms are useful for discovering groups and distributions in large data sets. Clustering algorithms find groups of objects such that the members of a group are similar to each other and different from the objects in other groups. Clustering algorithms can be used for data reduction and noise elimination as well as for hypothesis testing.

3.5.1 k-means clustering

The k-means clustering algorithm (kmc) partitions n objects xj, j

= ( 1 , 2 , K ,

n

)

, in a database D into a set of k clusters C_i, i

= ( 1 , 2 , K ,

k

)

and finds the mean m_i of each cluster [1] [8] [12] [16]. The clusters are partitioned by minimizing some cost function based on distance, typically the Euclidean distance. The Euclidean cost function is defined as

∑ ∑

= ∈

 





 



 −

=

^k

i qx C q i

i q

m x J

1 ,

||

2

||

where xq is all objects in cluster Ci, whose mean is mi. In summary, the objects x belong to the cluster C_i if the mean m_i is the closest centre among all centers [16].

The parameter k is an input argument to the algorithm, which means there must be some prior knowledge of the database D to be partitioned. Often the algorithm is run

)

19 .

3 (

(38)

Page 30 (56)

with different values of k and the value of k that yields the least square error criterion is chosen as the correct value [18].

The kmc is an iterative algorithm that goes through the following stages

1. Make initial guesses for mean values mi for each of the k clusters Ci. 2. Assign the objects x to the closest cluster centre with mean value m_i. 3. Re-compute the mean values mi for each of the k clusters Ci.

4. Repeat the stages 2-4 until the mean values m_i for each cluster C_i doesn’t change.

A result from a kmc run with three classes and a two dimensional feature set is shown in Figure 3.19. A problem faced with this algorithm when dealing with arbitrary cluster shapes is also visible. The kmc only relies on distance and therefore some points that in reality belong to cluster 2 are here grouped with cluster 1. The cluster center from cluster 1 is simply closer to these points than to the cluster center in cluster 2.

0 2 4 6 8 10 12 14

-40 -35 -30 -25 -20 -15

-10 k-mean clustering with three classes

Feature 1: Spectral Crest Factor

Feature 2: Renyi Entropy

cluster 1 cluster 2 cluster 3

Figure 3.19: k-means clustering example with k = 3.

(39)

Page 31 (56) 3.5.2 Density based scan clustering

A problem with a clustering technique such as the kmc is that it can have a hard time finding clusters of arbitrary shapes. Another clustering technique used to rectify this problem is the density based scan clustering algorithm (dbscan) [1] [15]. The dbscan is a clustering algorithm that uses the distance between objects in a group and the number of group members to decide if the group should be a cluster. Both the parameters for number of group members, minpts, and the parameter for minimum distance between members, eps, are used as input argument to the function.

In the dbscan algorithm there are two different kinds of points that are cluster members. The different points are called core points and border points, see Figure 3.20a. How the different points are defined will be described below.

eps neighborhood is the area within a radius eps of a point.

minpts is the minimum number of points in the area eps neighborhood of that point.

A core point is a point that has an eps neighborhood that contains at least minpts number of points.

A border point has fewer members than minpts within the area eps neighborhood but it’s within the eps neighborhood of a core point.

A noise point is the point that is not a core point or a border point, i.e. the points that are not members of a cluster.

A point p is directly density reachable from a point q if 1. p is within the area eps neighborhood of a point q.

2. q is a core point. See Figure 3.20b for an illustrative example.

Figure 3.20: Core points and border points (Figure taken from [15]).

A point p is density reachable from a point q if there is a chain of points such that one point connected, via the distance eps, to q is directly density reachable to p, see Figure 3.21a. A point p is density connected to a point q if there is a point o such that both the points p and q are density reachable from o, see Figure 3.21b.

(40)

Page 32 (56)

Figure 3.21: Density reachable and density connected points (Figure taken from [15]).

The dbscan algorithm works as follows 1. Select a random point p.

2. Find all points that are directly density reachable from p.

3. If p turns out to be a core point a cluster is formed and all points density reachable from that point is retrieved.

4. If p is a border point, no points are density reachable from p and the algorithm continues to the next point in the database.

5. The algorithm continues this process until all the points have been processed.

In Figure 3.22 a dbscan clustering has been completed and if compared to Figure 3.19 one can see the advantage of using a density based approach when dealing with clusters of arbitrary shapes.