• No results found

Determining recording time of digital soundrecordings using the ENF criterion

N/A
N/A
Protected

Academic year: 2021

Share "Determining recording time of digital soundrecordings using the ENF criterion"

Copied!
79
0
0

Loading.... (view fulltext now)

Full text

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Determining recording time of digital

soundrecordings using the ENF criterion

Examensarbete utfört i Elektroniksystem vid Tekniska högskolan i Linköping

av

Fredrik Andersson

LiTH-ISY-EX--09/4250--SE

Linköping 2009

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)
(3)

Determining recording time of digital

soundrecordings using the ENF criterion

Examensarbete utfört i Elektroniksystem

vid Tekniska högskolan i Linköping

av

Fredrik Andersson

LiTH-ISY-EX--09/4250--SE

Handledare: Håkan Johansson isy, Linköpings universitet Peter Bergström

SKL Erik Achrén

SKL

Examinator: Håkan Johansson isy, Linköpings universitet

(4)
(5)

Avdelning, Institution

Division, Department

Division of Electronics Systems Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2009-05-12 Språk Language  Svenska/Swedish  Engelska/English   Rapporttyp Report category  Licentiatavhandling  Examensarbete  C-uppsats  D-uppsats  Övrig rapport  

URL för elektronisk version

http://www.es.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-18231 ISBNISRN LiTH-ISY-EX--09/4250--SE

Serietitel och serienummer

Title of series, numbering

ISSN

Titel

Title

Tidsbestämning av digitala ljudinspelningar med hjälp av ENF-kriteriet Determining recording time of digital soundrecordings using the ENF criterion

Författare

Author

Fredrik Andersson

Sammanfattning

Abstract

In forensic investigations, verification of digital recordings is an important as-pect. There are numerous methods to verify authentication of recordings, but it is difficult to determine when the media was recorded. By studying the electrical network frequency, one can find a unique signature and then match the recording to this signature. By matching a recorded signal to a database, which contains all necessary information, one can find the time when the recording was made.

Nyckelord

(6)
(7)

Abstract

In forensic investigations, verification of digital recordings is an important as-pect. There are numerous methods to verify authentication of recordings, but it is difficult to determine when the media was recorded. By studying the electrical network frequency, one can find a unique signature and then match the recording to this signature. By matching a recorded signal to a database, which contains all necessary information, one can find the time when the recording was made.

(8)
(9)

Acknowledgments

I would like to thank SKL for the opportunity to write this thesis, my supervisor Erik Achrén and Peter Bergström for guiding me and letting me in when I forgot my security pass. A big thanks to Anders Nordgaard for the great help with statistics and the report. Also Per Brolund and Tobias Höglund for the questions, discussions and help regarding the project. I would like to thank my supervisor on the university, prof. Håkan Johansson, who has made himself available each week for discussions. A big thanks goes to prof. Fredrik Gustafsson for giving me good guidance and help.

(10)
(11)

Contents

1 Introduction 3 1.1 Background . . . 3 1.2 Task . . . 4 1.3 Limitations . . . 6 1.4 Target audience . . . 6 2 Methods 7 2.1 Methods in use . . . 7 2.1.1 Database recording . . . 7 2.1.2 Filters . . . 8

2.1.3 Frequency extracting algorithms . . . 8

2.1.4 Comparison algorithms . . . 9

2.2 Methods not in use . . . 10

2.2.1 WAV database recording . . . 10

2.2.2 Frequency extracting algorithm - Wigner transform . . . . 10

2.2.3 Utilizing the alias effect when downsampling . . . 10

3 System 11 3.1 The power grid . . . 11

3.1.1 50 Hz component . . . 12

3.2 System requirements . . . 12

3.2.1 ENF Workstations . . . 13

3.2.2 ENF recorders . . . 13

3.2.3 ENF database server . . . 13

3.3 Problems . . . 14

3.3.1 Power grid failure . . . 14

3.3.2 Multiple recordings . . . 14 3.3.3 Handheld devices . . . 14 3.3.4 UPS . . . 14 3.3.5 Compression . . . 15 4 Theory 17 4.1 Sampling . . . 17 4.2 Resampling . . . 18 4.2.1 Decimation . . . 19 ix

(12)

x Contents 4.2.2 Aliasing effect . . . 21 4.3 Filters . . . 21 4.3.1 IIR Filters . . . 22 4.3.2 FIR Filters . . . 25 4.4 Spectral analysis . . . 28 4.5 Non-parametric estimation . . . 28 4.5.1 Fourier transform . . . 28

4.5.2 Short Time Fourier Transform . . . 28

4.6 Parametric estimation . . . 29

4.6.1 Yule-Walker method . . . 29

4.6.2 Least Square method . . . 31

4.6.3 Recursive Least Square method . . . 33

4.6.4 Comparison algorithms . . . 34 5 Implementation 37 5.1 Implemented algorithms . . . 37 5.1.1 locatefreqs . . . 37 5.1.2 STFT . . . 37 5.1.3 Yule Walker . . . 38

5.1.4 Recursive Least Square . . . 38

5.1.5 Database matching . . . 38 5.2 Filters implementation . . . 39 5.3 Implementation notes . . . 40 5.3.1 Work order . . . 40 6 Conclusion 41 6.1 Simulation . . . 41

6.1.1 Monte carlo simulation . . . 41

6.1.2 Simulation model . . . 42 6.2 Result . . . 43 6.3 Conclusion . . . 44 6.4 Future work . . . 46 Bibliography 47 A Bash code 49 A.1 freq2.sh . . . 49 A.2 csvmerge.sh . . . 50 B MATLABTM code 51 B.1 locatefreqs.m . . . 51 B.2 searchdb.m . . . 53 B.3 Estimation algorithms . . . 54 B.3.1 est_stft.m . . . 54 B.3.2 est_aryule.m . . . 56 B.3.3 est_arrls.m . . . 58

(13)

Contents xi

(14)
(15)

List of Figures

1.1 Basic system description. . . 3

1.2 Plot database vs. extracted data . . . 5

2.1 Layout description. . . 8

3.1 Swedish power grid structure. . . 11

3.2 EKG example . . . 12

3.3 Basic system description. . . 12

4.1 Uniform sampling. . . 17

4.2 Spectrum of signals. . . 18

4.3 Aliasing effect. . . 20

4.4 Filter types. . . 22

4.5 FIR filter impulse response . . . 25

4.6 Impulse response, FIR filters. . . 26

5.1 csvmerge.sh work scheme. . . 40

6.1 Basic system description. . . 43

6.2 Noise variance, STFT . . . 44

6.3 Noise variance, parametric models . . . 45

C.1 Signal and FFT plot . . . 61

C.2 Surfplot . . . 62

C.3 Remez filter response . . . 62

C.4 Elliptic filter response . . . 63

C.5 Butterworth filter response . . . 63

C.6 Chebyshev I filter response . . . 64

C.7 Chebyshev II filter response . . . 64

List of Tables

2.1 Wavefile disk usage . . . 10

5.1 Locatefreqs inputs and outputs . . . 37

5.2 est_stft inputs and outputs . . . 38

5.3 est_aryule inputs and outputs . . . 38

5.4 est_arrls inputs and outputs . . . 39

5.5 searchdb inputs and outputs . . . 39

5.6 filter types inputs and outputs . . . 39

5.7 Part of a .csv file . . . 40

5.8 Part of a .db file . . . 40

6.1 STFT Simulation results . . . 43

(16)

2 Contents

(17)

Chapter 1

Introduction

This chapter will present a short introduction to the problem, the task and the field of use. A short example will be presented to give a scenario of how the method can be used.

1.1

Background

Analog devices, such as tape recorders, not only capture the intended signal, but also characteristics of the recorder itself. Tape recorders leave signatures from their record and erase heads which then can be used to identify the specific recorder used. It can also be seen if another recorder has been used to change the original contents and then it is possible to come to the conclusion that the recording has been manipulated. Nowadays, the recordings are increasingly made by digital devices and these do not leave physical traces in the same way as their analog counterparts. Therefore a new approach is needed to determine the authenticity of digital recordings.

The method described in this thesis is based on research made by Catalin Grigoras [2] and it involves studying the electrical network frequency (ENF). In Sweden, Svenska Kraftnät AB is responsible for keeping the electrical network frequency in the interval 50 ± 0.1 Hz. The deviation from 50 Hz depends on the current load on the electrical network - a large load makes the generators rotate

v(t) Fn(z)

s(t) + y(t)

Figure 1.1: Basic system description. v(t) is ENF, s(t) is music or speech and y(t) is the full recording.

(18)

4 Introduction

slower thus lowering the frequency and a small load will increase the frequency. These frequency deviations over time creates a unique pattern [2] and this is what makes the ENF method an interesting choice for determining date and authenticity of digital recordings. The simplified system is modeled as Fig. 1.1 where s(t) is a signal containing speech or music, v(t) is the disturbance from the electric network, with known properties modeled by Fn(z), and the resulting recorded signal is y(t).

Note that the amplitude of v(t) is significantly smaller than s(t) and it can usually not be noticed without performing any signal processing.

To give the reader some understanding of the method and its use, a short example is presented. This is not based on a real case, although the situation may exist in a similar form.

Example 1.1: Use of ENF

There are two actors in this example: person A and person B. A is a telephone salesman and B is a customer who has been offered expensive training equipment by A. It is not uncommon that telephone salesmen record conversations to use the recording as a proof that they made a sale. The situation is now that the salesman claims to have sold the equipment and that B claims to never have agreed to buy the equipment at all.

Forensic investigators are presented with a CD containing the recording made by A and can hear what appears to be an authentic recording with a sale being made. By just listening to the recording, there is no way of directly telling whether it is manipulated or not. They decide to take a closer look at the frequency contents of the recording and especially around 50 Hz (the ENF).

The investigators find one matching sequence, but the recording contains anoma-lies when the frequency is compared to the database. In Fig. 1.2 we see how the frequency contents match overall, but at some positions the sound file frequency differs from the database. Now the investigators listen to the recording once again and find that at each of the times listed above, person B can be heard saying ”yes”. This indicates that person A has inserted a ”yes” answer from another part of the conversation and placed it where appropriate to get a confirmation of the sale.

1.2

Task

The problem formulation given was:

• Examine what has been done in the area.

• Develop and implement one or several methods to extract ENF-data from

sound recordings.

• Develop and implement one or several methods to compare extracted

(19)

1.2 Task 5

Figure 1.2: Database plotted against extracted data. Note the instances where the signals differ.

(20)

6 Introduction

Much of this has already been covered by Fredrik Lindholm during his thesis [5]. It was however decided to focus on the extraction of the ENF from a recording, as this is a very important part of the process. Without a good signal, it does not matter how well the comparison algorithm performs.

1.3

Limitations

• Electrical network limited to Östergötland, Sweden.

• Digital media only. Analog recording devices require other techniques. • Requires the digital equipment to be connected directly to the electrical

network.

1.4

Target audience

• Statens Kriminaltekniska Laboratorium (SKL) and other forensical

author-ities.

• Law enforcement.

(21)

Chapter 2

Methods

The given problem formulation was quite wide in terms of what needed to be done and required to be narrowed down. From reading the previous thesis [5] and studying material regarding the structure of the problem, a more restrictive problem formulation was made.

It was decided that this thesis should:

• Improve the techniques used for filtering.

• Implement new methods to improve the extraction of frequency information

from recordings.

• Test the accuracy of frequency estimations. • Enhance any existing implementation.

The basic idea of determining when a recording was made is illustrated in Fig. 2.1. To the electrical network, there is an ENF recorder and some digital recording equipment attached. The ENF recorder saves frequencies together with time in-formation to a database which will be used as a reference. Whenever a digital recording is made, the ENF is also recorded. Through filtering and an estimation algorighm the ENF is extracted from the recording and then compared to the database.

2.1

Methods in use

2.1.1

Database recording

On SKL there is a ratemeter that is calibrated to pick up the frequency component around 50 Hz. The picked up frequencies are stored together with its time as a csv1

-file on disk using the bash script freq2.sh (A.1). The reason to use the ratemeter instead of recording a soundfile for each day is to save storage space and to save

1Comma Separated Value

(22)

8 Methods Electrical network Recording equipment Digital recording Filtering and Frequency estimation ENF Recorder ENF Database Comparison Result ? ? ? ? ? -  ?

Figure 2.1: Layout description of method.

time. The ratemeter extracts frequency values straight to the database, instead of having to run a frequency-extracting algorithm on each recording. The ENF origins from physical generators and the fluctuation depends on the current load, which results in quite slow frequency changes. The ratemeter reads two values per second and that is enough to capture the fluctuations. A 24 hour period takes requires about 2.2 MB of disk space and over a year that translates into roughly 800 MB of disk usage.

2.1.2

Filters

Filtering is a wide concept and the different filters used depend heavily on the application and its use. Some filters are designed to remove unwanted distortion while some filters are designed to remove everything but a narrow frequency band. In order to properly estimate the ENF around 50 Hz, it is required to preprocess the signal and remove any unwanted signal information. The signal is therefore filtered through a bandpass filter and thus leaving the interesting frequency range to be further manipulated or investigated.

2.1.3

Frequency extracting algorithms

The frequencies in a signal need to be extracted in order to compare them to a database. Three methods have been implemented and compared.

Non-parametric estimation

Non-parametric estimation requires no prior information about the signal, such as statistical or spectral properties. This is considered an advantage, since one can apply the method and obtain a good result without knowing any certain properties of the signal itself. The Fourier transform is such an estimation form. The Fourier transform is a powerful and efficient way of estimating frequency contents of a signal. The method is common and widely used and therefore it is a natural choice to implement. The drawback is that the transform assumes a periodic input and

(23)

2.1 Methods in use 9

an infinite length of the input signal. This is however not the case when dealing with real data and the Discrete Fourier Transform (DFT) is the approximation that will be used. Another disadvantage is that the DFT is designed to work with large batches of data rather than recursively, which makes the method harder to adapt to signals where the frequency contents change over time.

Parametric estimation

Parametric estimation is another way of representing a signal and its contents. Parametric methods estimate parameters in a structure rather than an actual sig-nal or frequencies. Common model structures are Auto Regressive (AR), Moving

Average (MA) and Auto Regressive Moving Average (ARMA). These model

struc-tures have the ability to use prior information about a given signal or process and one would assume that this can be used to improve estimation accuracy and reso-lution. This thesis will focus on AR models. Two different parametric estimation algorithms are evaluated:

• Yule Walker - Estimate parameters using an estimate of covariances. • Recursive Least Square - Implementation of Least Square method with a

forgetting factor introduced.

2.1.4

Comparison algorithms

The comparison algorithms were developed during Lindholms thesis [5] and will be somewhat described in Chapter 4.

Simple distance measurement

This method requires that both frequency and fluctuation patterns match between the database and the extracted data. The extracted signal is moved across the database and for each small move, a distance is calculated. This is done until the whole database is covered and the lowest distance value is the most likely match. The advantage of this method is that it returns fewer hits, but it also requires that the extracted frequencies are aligned to the database. A static offset of 3 Hz would considerably lower the performance of this algorithm.

Normalized cross correlation

This method is not dependent of actual frequency since it only looks for matching patterns. The advantage is that there is no need to calibrate any offset, but the disadvantage is that the extracted pattern may appear multiple times in the database. Therefore a longer recording is needed to ensure a good match.

(24)

10 Methods

Days Seconds Capacity [MB]

1 86400 20.232

7 604800 141.624 31 2678400 627.192 365 31536000 7384.68

Table 2.1: Disk usage for comparison against 800 MB from the ratemeter.

2.2

Methods not in use

2.2.1

WAV database recording

One method to store the database would be to record the electrical network fre-quency through the line-in of a common sound card. As mentioned above, an algorithnm would have to be applied to extract the useful information and that takes time. Also, if recording the ENF database as a WAV PCM, 16 bit at 120 Hz sample rate, the storage size will be according to table 2.1 [2]. Another disadvan-tage is that sound cards behave differently depending on brand or chip used. A comparison between two computers with different sound cards showed a difference of 0.1 Hz when recording a pure generated 50 Hz signal.

2.2.2

Frequency extracting algorithm - Wigner transform

This method was investigated during Lindholms thesis [5] and was deemed unsuit-able due to its computational complexity. The algorithm works at order(n3) and would require too much in terms of hardware to be efficient. The method will not be further described in this thesis.

2.2.3

Utilizing the alias effect when downsampling

The parametric model estimations need to consider frequency content outside the scope of the ENF which might introduce odd frequency estimations that deviate substantially from the wanted 50 Hz. A way to only consider the actual frequency range around 50 Hz is to use downsampling as well as the alias effect [6, 7]. If the Nyquist-Shannon sampling theorem (theorem 4.1) is not fulfilled, frequency information above the new sampling frequency will be aliased down into the new frequency band.

This should be possible to do, but the implementation proved to have flaws and the frequency estimations were very unstable. One reason behind this behaviour could be that there were too few samples on which to perform computations. Another explanation could be that the builtin MATLABTM code for resampling

is not accurate enough to retain the important signal information. One would probably need to implement the algorithm from base-up, carefully selecting filters and other techniques to properly achieve a good signal.

(25)

Chapter 3

System

3.1

The power grid

The frequency in the nordic frequency region1 carries the same waveform due

to electromagnetic wave propagation. The entire electric network formed by the interconnected systems, including all the sources and loads, will carry the same frequency [2]. This theoretically means that one ENF recorder is sufficient to cover all of the nordic frequency region.

This is not the case when the electric network fails due to broken wires or other conditions. Then isolated nodes can arise in these areas that are not connected to the main power grid and these are instead powered by local power stations. The network frequency will then differ from the rest of the grid, see Fig. 3.1. There is one permanently isolated node (in terms of ENF) in the swedish region of the grid, which is Gotland. The current is transported to Gotland via a DC (direct current) cable and then transformed into active current (AC). This effectively means that to have complete coverage of Sweden, it is needed to place a separate ENF recorder on Gotland as well.

1Sweden, Norway, Finland and the Danish island Zealand (sv. Själland)

Producer

Base grid

Regional grid Isolated grid

Consumer

Figure 3.1: Swedish power grid structure.

(26)

12 System 0 50 100 150 200 250 300 −4 −2 0 2 4 6 8 10 12 (a) 0 50 100 150 200 250 300 −2 0 2 4 6 8 10 (b) Figure 3.2: (a) EKG signal with noise from the electric network.

(b) Filtered EKG signal.

Recorder Recorder Recorder

ENF Database

Workstation Workstation

ENF Recorders

ENF Workstations

Figure 3.3: Basic system description.

3.1.1

50 Hz component

The use of the electric network frequency is a rather new technique in forensic investigations, but as a concept it is not. If one disconnects an electric guitar from an amplifier you can often hear a distinct humming sound which is the 50 Hz component. Some equipment, such as EKG instruments on hospitals, are very sensitive to disturbance and for these instruments it is crucial to remove the ENF in order to clearly see the heartbeat rhythm. Fig. 3.2 shows a generated EKG signal before and after filtering.

3.2

System requirements

Fig. 3.3 shows a description of the system and what is needed in terms of software and hardware will be described.

(27)

3.2 System requirements 13

3.2.1

ENF Workstations

The workstations extract signal information from a digital recording and then compare it to the ENF database. These can be placed anywhere in the country, given that a connection to the ENF database can be established. Preferably, the workstations are located in SKL facilities. The workstations at least require:

• Operative system: GNU/Linux. • SSH connection to the ENF Database. • Sufficient hardware to run MATLABTM.

• MATLABTM installed.

• MATLABTM Signal processing toolbox.

• The written MATLABTM code.

• The bash script csvmerge.sh.

3.2.2

ENF recorders

The ENF recorder is the whole system around the ratemeter that records and saves ENF information to disk. Once every day, the information should be sent to a central ENF database and stored. The recorders should be placed in facilities where physical access can be controlled, such as police stations. Preferably one or more recorders should be placed in each landscape to cover any grid failure. As minimum, atleast one recorder needs to be placed on Gotland due to the local network and one recorder anywhere else in Sweden. For continuous work, the following are required for ENF recorders:

• Operative system: GNU/Linux. • SSH connection to the ENF Database. • The bash script freq2.sh.

• Cron daemon configured to send recorded data to the server each day. • Rate meter.

• Connected to UPS. (see 3.3.4)

3.2.3

ENF database server

The server can physically be placed nearly anywhere in the country, but most suitable would be in SKL facilities. The server needs to be online all the time for the ENF recorders to upload ENF data. The structure really only serves as a fileserver without any real database properties (such as SQL etc.) so each ENF recorder should have a specific user account on the database server where the ENF recordings are saved.

(28)

14 System

• Operative system: GNU/Linux. • SSH server running.

• Connected to UPS. (see 3.3.4)

• Sufficient disk space to store ENF data. This will vary depending on how

many recorders there are in the system.

3.3

Problems

For any recording device to also record the ENF signal, the device must be con-nected to the electrical network. If it is not, the ENF will be missing in the recording and then there will be no data to compare to the database. This section will describe some of the cases where the ENF method will not work.

3.3.1

Power grid failure

As mentioned before, power grid failures can create local nodes that will produce a different ENF shape. If a recording device is picking up this local ENF, a comparison cannot be made. This problem can be mitigated by placing many ENF recorders throughout the country and covering each of these possible nodes.

3.3.2

Multiple recordings

If a sound recording has been made and the ENF is picked up, it is possible to analyze it. However, if the original recording is played and then recorded to another digital recording (like copying on an old cassette recorder), a new ENF will also be recorded ontop of the old one. This will render two ENF signals and it will be hard to determine which ENF is the original one. This case is not very likely as it is easier to just copy the recorded file itself, but it can still occur.

3.3.3

Handheld devices

If a recording has been made through a cell phone or a handheld recorder, there is no connection to the electrical network and the ENF will not be picked up. However, the ENF might be transferred from nearby power cables, light sources or other electrical equipment that is not properly shielded. The situation where ENF is recorded from a nearby source does not seem very likely, but could be further investigated.

3.3.4

UPS

Whenever an Uninterruptable Power Supply (UPS) deems the normal power sup-ply unstable or unreliable, it will disable the electrical network and supsup-ply power to the connected devices. If a recording device is connected to a UPS during a power failure, there will be no ENF recorded. There are two categories of UPS devices:

(29)

3.3 Problems 15

Online UPS: Will continuously feed devices connected to it and these devices

will never record ENF signals.

Offline UPS: Stands ready to take over power supply whenever the electrical

network fails. Devices connected to an offline UPS will record ENF as long as the electrical network is stable.

3.3.5

Compression

Often when a recording has been made, one usually want to compress it to save disk space. There are two main types of compression, lossless audio compression and lossy audio compression. An uncompressed file will contain all the necessary information and so will the lossless compression as the main idea of a lossless compression algorithm is to obtain a smaller file but still keep 100% of the infor-mation and be able to fully reconstruct the original file. In lossy compression, this requirement is taken away and the goal is instead to achieve a good compression level while keeping the useful information. Different lossy compression formats are suitable for different tasks. Mp3 is very commonly used when compressing music audio files, while the musepack is optimized for transparent compression of stereo audio.

Uncompressed formats: WAV, AIFF, AU etc.

Lossless compression formats: FLAC, Apple Lossless, MPEG-4 ALS,

Mon-key’s Audio, TTA, lossless Windows Media Audio (WMA) etc.

Lossy compression formats: Mp3, Vorbis, Musepack, ATRAC, lossy Windows

Media Audio (WMA), AAC etc.

It is currently unknown what the different lossy compression algorithms do to the 50 Hz component, but most likely it will be different from the original. This would need more investigation as it is very easy and common to compress an audio file to mp3.

(30)
(31)

Chapter 4

Theory

This chapter will cover the theoretical aspects of the thesis and can be difficult to grasp for the reader who does not have a background in signal processing or similar, so it is recommended to skip the whole chapter unless you are interested in the details and theory.

4.1

Sampling

Sampling is usually done to convert analog signals into a digital representation. This is not needed in this application, but it is explained to give some insight in discrete time resampling. Illustrated in Fig. 4.1, the discrete time signal x[n] is created from the continuous time signal x(t) with the uniform sampling interval

T . The relationship between these can be expressed as: [9]

x[n] = x(t)|t=nT = xa(nT ) (4.1)

If X[Ω] is the frequency spectrum of the discrete time signal and X[ω] is the frequency spectrum of the continuous time signal, the relationship between these two can be described by the Poisson summation formula: [9]

X[Ω] = 1 T X k=−∞ Xa Ω − 2πk T  (4.2) xa(t)b - b ? t = nT b - b x[n] Figure 4.1: Uniform sampling.

(32)

18 Theory

From (4.2) we can also conclude: [9] 1* X(ω)|ω=Ω/T is calculated for all Ω.

2* The result in 1* will be scaled by a factor T1.

3* The result in 2* will is repeated an infinite number of times with period 2π. 4* The terms in 3* will be added.

As an illustration of this, Fig. 4.2 shows the spectrum of a signal in continuous and discrete time. From these conclusions we can realize the Nyquist-Shannon sampling theorem [8], which nowadays is central to signal processing:

-ω 6 T T T T     Xa(ω) A −fs/2 fs/2 (a) - 6 T T T T     X[Ω] A T T T T T     −2π T T T T     −4π T T T T     T T T T     (b)

Figure 4.2: (a) Spectrum of continuous time signal. (b) Spectrum of sampled discrete time signal.

Theorem 4.1 (Nyquist-Shannon sampling theorem) If a function f(t)

con-tains no frequencies higher than W cps, it is completely determined by giving its ordinates at a series of points spaced 1

2W seconds apart.

Which in other words means: A sampled signal can be fully reconstructed if the following condition is met:

fs> 2f0 (4.3)

4.2

Resampling

The first issue to consider when manipulating signals is the sampling frequency. The sample frequency, fs, will determine the amount of samples per second and a

(33)

4.2 Resampling 19

computer has limited resources it is desirable to reduce the amount of samples to decrease the amount of computations. This is commonly done through decimation.

4.2.1

Decimation

Decimation is done in two steps. First the discrete time signal is filtered through a discrete time lowpass filter and then it is downsampled. A downsampling of x[n] with a factor N means that a new signal, xd[n], is created by taking every N:th

sample from x[n]. Other samples are neglected. This can be written as [9]:

xd[n] = x[nN ] (4.4)

Sampling of a discrete time signal can be done by multiplying the signal with a pulse train, p[n], as [9]: xp[n] = x[n]p[n] = x[n] X k=−∞ δ[n − kN ] (4.5) The Fourier tranform of the downsampled signal can be expressed as [9]:

Xp[Ω] = 1 N N −1 X k=0 XhΩ − k2π N i (4.6) (4.6) can be said to be the sampling in discrete time equivalent of the Poisson summation formula (4.2). It implies a scaling with factor 1/N and the spectra for

X[Ω] is repeated N times with the distance 2π/N , where N is the sample period,

that is the distance between the samples in x[n] that creates xp[n].

To avoid distorsion at this new sampling frequency, 2π/N needs to be larger than 2Ωm, where Ωm is the bandwidth for X[Ω]. If this condition is not met,

the different repetitions of X[Ω] will overlap and cause distorsion. If 2π/N is written as Ωs, ”the sample frequency”, we then need the following for distiorsion

free sampling [9]:

s> 2Ωm (4.7)

This can be said to represent the discrete time version of the Nyquist-Shannon sampling theorem. The filter that precedes the downsampling needs to fulfil the requirement (4.7).

xd[n], that is acquired through decimation of x[n], can also be constructed by

decimating the sampled signal, xp[n]: [9]:

xd[n] = x[nN ] = xp[nN ] (4.8)

The Fourier transform of the decimated signal can then be written as [9]:

Xd[Ω] = Xp hΩ N i = 1 N N −1 X k=0 hΩ N − k N i (4.9) Fig. 4.3 shows an example of X[Ω], Xp[Ω] and Xd[Ω] for N = 3.

(34)

20 Theory - 6 X[Ω] A L L L L L −Ωmm −π π L L L L L −2π L L L L L (a) - 6 Xp[Ω] A N L L L L L −Ωmm −π π L L L L L −22π N L L L L L −2π N L L L L L N L L L L L 2 N L L L L L −2π L L L L L (b) - 6 Xd[Ω] A N l l l l l , , , , , −N Ωm N Ωm −π π l l l l l , , −2π l l , , , , , (c) - 6 Xp[Ω] A N −π π L L LL LL LL LL LL LL LL LL LL LL LL LL LL L L LL LL LL LL LL LL LL LL LL LL LL LL LL L L LL LL LL LL LL LL L L L L L L L L L L L L −2π −22π N −2πN −Ωmm 2πN 22πN (d)

Figure 4.3: (a) X[Ω] = F {x[n]}. X[Ω] has the bandwidth Ωm.

(b) Xp[Ω] = F {xp[n]}. xp[n] sampled version of x[n],

sample frequency: 2π/N > 2Ωm. N = 3.

(c) Xd[Ω] = F {xd[n]}. X[Ω] has bandwidth Ωm.

(35)

4.3 Filters 21

4.2.2

Aliasing effect

As seen in Fig. 4.3(d), the frequencies above the nyquist frequency are folded down into a lower frequency band. It is possible to use this to achieve even lower sampling frequency.

Example 4.1: Downsampling with aliasing effect

For simplicity, assume a signal x with sampling frequency 240 Hz and say that the interesting frequency band is between 30 and 60 Hz.

1. xds1= LP (0.5, x), lowpass filter with normalized cut-off frequency 0.5.

2. Downsample xds1 by a factor 2.

3. xds2= HP (0.5, xds1), highpass filter with normalized cut-off frequency 0.5.

4. Downsample xds2 by a factor 2.

5. xds= (−1)kxds2(k) to reverse spectra.

In 1) the frequency content above 60 Hz is removed and in 2) the sample rate is decimated to fs= 120Hz. In 3) all the information below 30 Hz is removed and

then in 4) during the decimation, the interval 30-60 Hz is aliased backwards to 0-30 Hz. This means that the frequencies are folded back according to:

faliased= fs− fold (4.10)

meaning that the 60 Hz component is folded down to 0, 50 Hz component folded down to 10 and the 30 Hz component stays at 30 Hz. To remedy the reversed spectra, the periodic behaviour of the DFT can be used by frequency shifting the DFT with half the sample frequency.

Xds(f ) = Xds2(f −

fs

2 (4.11)

where Xds2(f ) is the DFT of xds2 and fs is the new sample frequency. In 5) the

frequency shift is implemented in the time domain as a sign conversion for every other sample:

xds(n) = (−1)nxds2(n) (4.12)

the frequency then translates as ftrue= fds+ 30.

4.3

Filters

Digital filters can be categorized into FIR (Finite-length Impulse Response) and IIR (Infinite-length Impulse Response) filters. FIR filters have the nice property that they can guarantee stability and to have a linear-phase response [11]. FIR

(36)

22 Theory Amax Amin ωcωs ω A(ω) (a) Amax Amin ωsωc ω A(ω) (b) Amax Amin ωs1ωc1 ωc2 ωs2 ω A(ω) (c) Amax Amin ωc1ωs1 ωs2 ωc2 ω A(ω) (d)

Figure 4.4: The four different filter types: (a) Lowpass filter, (b) Highpass filter, (c) Bandpass filter, (d) Bandstop filter

filters however usually require much higher orders than the respective IIR filter and they often introduce large group delay that makes them unsuitable for many applications. This does not matter much for this particular application as it is a purely offline system (no new data is introduced or required during runtime). The IIR filters have been implemented for speed and testing purposes but are not recommended for use because of the nonlinear phase property which can be seen in appendix C.4 through C.7.

The four types of filters considered are (see also Fig. 4.4):

• Lowpass filter (LP) • Highpass filter (HP) • Bandpass filter (BP) • Bandstop filter (BS)

4.3.1

IIR Filters

A common and easy way to design digital filter is to design an analog reference filter and then perform a transformation to turn the analog tranfer function into the z-domain. It is not possible to preserve the exact properties from the analog to the digital domain and there are many available mappings, such as bilinear,

(37)

4.3 Filters 23

impulse-invariant, step-invariant etc, but in practice only the bilinear transforma-tion is appropriate to use for frequency selective filters [11] and it will be the only transformation presented. The design procedure is to:

1* Map the magnitude response requirements of the digital filter on to equiva-lent requirements of an analog filter.

2* Solve the approximation problem for the analog filter, which give the analog lowpass transfer function H(s)

3* The poles and zeros of H(s) are mapped onto the the digital transfer function

H(z)

4* The digital lowpass filter is transformed into the required filter if needed (HP, BS, BP)

Bilinear transformation

The bilinear transformation is defined:

s = 2 T

z − 1

z + 1 (4.13)

The analog magnitude response requirements to digital equivalents can from (4.13) be derived as [11]: ωac= 2 T tan ωcT 2 (4.14)

The approximation problem is solved for the lowpass filter to obtain the analog transfer function H(s) (the interested reader can find more in [11] on how to do this). The phase response is distorted from using the bilinear transformation. Finally, from (4.13), the poles and zeros from the analog filter are mapped to the digital domain by [11]:

z = 1 +

sT

2

1 −sT2 (4.15)

Lowpass to highpass transformation

A digital lowpass filter is transformed into a highpass filter using [11]:

Z−1→ − z −1+ α 1 − αz−1, α = − coscT +ωcT 2 cosωcT −ΩcT 2 (4.16)

(38)

24 Theory

Lowpass to bandpass transformation

A digital lowpass filter is transformed into a bandpass filter using [11]:

Z−1→ z −2− βz−1+ γ γz−2− βz−1+ 1 (4.17) β = 2αk k + 1, γ = k − 1 k + 1, α = cosωc1T +ωc2T 2 cosωc2T −ωc1T 2 , k = cotωc2T − ωc1T 2 tan ΩcT 2 A special case is when the bandpass filter is symmetric around π/2, that is: ωc1T +

ωc2T = π. In this case the transformation is simplified into:

Z−1→ −z−2 (4.18)

Lowpass to bandstop transformation

A digital lowpass filter is transformed into a bandstop filter using [11]:

Z−1→ z −2− βz−1+ γ γz−2− βz−1+ 1 (4.19) β = 2αk k + 1, γ = k − 1 k + 1, α = cosωc1T +ωc2T 2 cosωc2T −ωc1T 2 , k = tanωc2T + ωc1T 2 tan ΩcT 2 A special case is when the bandstop filter is symmetric around π/2. In this case the transformation is simplified into:

Z−1 → z−2 (4.20)

Standard approximations

The following standard approximations are all available in MATLABTM signal processing toolbox and they utilize the bilinear transformation (4.13). A short list of their properties are listed.

Butterworth filters:

• Monotonically decreasing magnitude response. • All zeros at s = ∞.

• Highest filter order.

Chebyshev-I filters:

• Equiripple passband.

• Monotonically decreasing in the stopband • All zeros at s = ∞.

• Filter order between Butterworth and Cauer.

(39)

4.3 Filters 25 - n 6 h(n) r r r r r r r r r r r N r r r r

Figure 4.5: Impulse response of a causal FIR filter of order N (length N + 1).

• Equiripple stopband.

• Monotonically decreasing in the passband. • Finite zeros.

• Same order as Chebyshev-I.

Cauer (elliptic) filters:

• Equiripple passband and stopband. • Finite zeros.

• Lowest filter order.

4.3.2

FIR Filters

For a causal FIR filter of order N (length N + 1), the impulse response is nonzero for 0 ≤ n ≤ N and zero for all other values of n [11]. This is shown in Fig. 4.5. The transfer function (4.21) and frequency response (4.22) can be written as [11]:

H(z) = N X n=0 h(n)z−n (4.21) H(ejωT) = N X n=0 h(n)e−jωT n (4.22) As mentioned previously, FIR filters can guarantee linear-phase response. These filters are called linear-phase FIR filters and their impulse response are either symmetric (4.23) or anti-symmetric (4.24). A linear-phase FIR filter is obtained by [11]:

h(n) = h(N − n), n = 0, 1, . . . , N. (4.23)

h(n) = −h(N − n), n = 0, 1, . . . , N. (4.24) Also the order N can be either even or odd, so there are four different types of linear-phase FIR filters denoted as Type I, II, III and IV.

Type I: h(n) = h(N − n), N even

Type II: h(n) = h(N − n), N odd

Type III: h(n) = -h(N − n), N even

Type IV: h(n) = -h(N − n), N odd

(40)

26 Theory - n 6 h(n) r r r r r r r N

(a) Type I, N even

- n 6

h(n)

r r r r r r

N

(b) Type II, N odd

- n 6 h(n) r r r r r r r N

(c) Type III, N even

- n 6 h(n) r r r r r r N

(d) Type IV, N odd

Figure 4.6: Impulse responses for the four types of linear-phase FIR filters.

Fig. 4.6 shows examples of impulse responses for the different filter types. Note that the centre value h(N/2) is always zero for Type III filters.

Frequency response

To design a linear-phase FIR filter it is easier to express its frequency response using a real function HR(ωT ) (also called the zero-phase frequency respose of H(z)

as [11]:

H(ejωT) = ejΘ(ωT )HR(ωT ) (4.26)

The magnitude response of H(ejωT) is equal to the magnitude of H

R(ωT ) as:

|H(ejωT)| = |H

R(ωT )| (4.27)

The real function HR(ωT ) can of course take negative values, whereas the function

|H(ejωT)| always is non-negative. The function Θ(ωT ) in (4.26) is then given by:

Θ(ωT ) = −N ωT

2 + c (4.28)

where c = 0 for filters with a symmetric impulse response (Type I, II) and c = π/2 for filters with antisymmetric impulse response (Type III, IV). Since HR(ωT ) is

a real function, its phase response is either 0 or ±π [11]. The phase response of

H(z) is then linear as we can see in:

Φ(ωT ) =  −N ωT 2 + c, HR(ωT ) ≥ 0 −N ωT 2 + c ± π, HR(ωT ) < 0 (4.29) From (4.29) it can also be seen that the group delay of linear-phase FIR filters is

(41)

4.3 Filters 27

FIR filter design using optimization

The specification of a lowpass filter is commonly given as [11]: 1 − δc ≤ |H(ejωT)| ≤ 1 + δc, ωT ∈ [0, ωcT ]

|H(ejωT)| ≤ δ

s, ωT ∈ [ωsT, π] (4.30)

where δc, δs, ωcT and ωsT denote the passband ripple, stopband ripple, pasband

edge and stopband edge respectively. For linear-phase FIR filters, the specification of (4.30) can be restated with the aid of HR(ωT ) according to [11]:

1 − δc≤ |HR(ωT )| ≤ 1 + δc, ωT ∈ [0, ωcT ]

−δs≤ |HR(ωT )| ≤ δs, ωT ∈ [ωsT, π] (4.31)

The acceptable deviations in the passbands and stopbands can alternatively be expressed in terms of maximal allowable deviation in the attenuation in the pass-band and the minimum attenuation in the stoppass-band respectively. The variation of the attenuation in the passband is [11]:

Amax= 20 log

1 + δc 1 − δc



≈ 17.31δc [dB] (4.32)

and the minimum attenuation in the stopband is [11]:

Amin= 20 log

1 + δc

δs



[dB] (4.33)

An accurate order estimate, especially for short filters, is used in MATLABTM in the function firpmord. The required order for a linear-phase lowpass FIR filter is:

N ≈ D∞(δc, δs) − F (δc, δs)(∆ωT /2π)

2

∆ωT /2π (4.34)

where

D∞(δc, δs) = [a1(log(δc))2+ a2log(δc) + a3] log(δs) + a4(log(δc))2+ a6log(δc) + a6

F (δc, δs) = b1+ b2(log(δc) − log(δs))

∆ωT = ωsT − ωcT

The constants are [11]: a1 = 0.0005309, a2 = 0.07114, a3 = −0.4761, a4 =

−0.00266, a5= −0.5941, a6= −0.4278, b1= 11.01217, and b2= 0.51255.

Approximation problem

The McClellan-Parks-Rabiner’s design is a commonly used optimization technique to obtain a linear-phase FIR filter. The algorithm finds the unique set of filter coefficients that minimizes a weighted error function and the interested reader can find more information about the algorithm in [11].

(42)

28 Theory

4.4

Spectral analysis

In order to find any useful information in a digital sound recording, it is needed to analyze its frequency content. A useful tool is to estimate the power spectrum and there are several methods to do so. This thesis will cover two different methods of spectral estimation, non-parametric estimation 4.5 and parametric estimation 4.6.

4.5

Non-parametric estimation

4.5.1

Fourier transform

The most common method is to use the Fast Fourier Transform (FFT) which is an implementation of the Discrete Fourier Transform (DFT). It is not uncommon that the DFT is referred to as FFT which suggests the widely spread use of the algorithm [3]. The DFT, X(n), and its inverse, x(k), for a signal with a finite length of N is defined as:

X(n) = N −1 X k=0 x(k)e−i2πnN k (4.35) x(k) = 1 N N −1 X n=0 X(n)ei2πkN n (4.36) k, n = 0, . . . , N − 1

The power of the signal is calculated as the square of the transform. With a sample frequency fs, X(n) will be defined on the frequency points f (n) = fsn/N . The

frequency we want to estimate is assumed to have the highest power, so we can estimate our frequency ˆf using:

ˆ n = argmax n=[0,N −1] |X(n)|2 (4.37) ˆ f = f (ˆn) = fs ˆ n N (4.38)

4.5.2

Short Time Fourier Transform

What we have done so far is merely a DFT and selecting one frequency with the greatest power. This will only find one single frequency over the whole length of the signal. In order to acquire several frequencies from different times in the recording, we need to introduce a window that we move along the signal and do the steps (4.35) to (4.38) on each windowed segment. The STFT is defined as [3]:

X(ω, k) =

X

m=−∞

(43)

4.6 Parametric estimation 29

where m is the discrete time variable. Usually n is defined as n = LN where n is the window length and the amount of overlap between adjacent windows depends on L. A wide window will provide us with many samples on which to perform the FFT but that will reduce the resolution in time. A window that is short will have a good resolution in time, but the number of FFT points will decrease and will reduce frequency resolution instead. The window length implemented has been found through testing different lengths.

4.6

Parametric estimation

The focus is to accurately estimate frequencies over a number of time instants rather than reconstructing the original signal. For an AR model, the location of the frequency only depends on the estimated parameters. In equation (4.42), the parameter vector θ is what is to be used to acquire the estimated frequency. In sections 4.6.1 to 4.6.3 the following will be used to transform the estimated parameters into an estimated frequency:

ˆ f =arccos − ˆ a1 2√aˆ2 2πTs (4.40) An n-order AR-model is described by [3]:

y(t) + a1y(t − 1) + · · · + any(t − n) = e(t) ⇒

y(t) = −a1y(t − 1) − · · · − any(t − n) + e(t)

(4.41) where e(t) is white gaussian noise with variance λe.

A common representation of the model can be written as [3]:    ϕ(t) = (−y(t − 1) . . . − y(t − n))T θ = (a1. . . an)T y(t) = ϕT(t)θ + e(t) (4.42)

4.6.1

Yule-Walker method

We have the AR(n)-model as described by (4.41).

Lag 0: Multiply both sides with y(t) and take expectance [3]:

Ehy2(t)i= Ehy(t)(−a1y(t − 1) − . . . − any(t − n) + e(t))

i

Ryy(0) = −a1Ryy(1) − . . . − anRyy(n) + σe2 (4.43)

Lag 1: Multiply both sides with y(t-1) and take expectance [3]:

Ehy(t)y(t − 1)i= Ehy(t − 1)(−a1y(t − 1) − . . . − any(t − n) + e(t))

i

Ryy(1) = −a1Ryy(0) − . . . − anRyy(n − 1) (4.44)

.. .

(44)

30 Theory

Lag n: Multiply both sides with y(t-n) and take expectance [3]:

Ehy(t)y(t − n)i= Ehy(t − n)(−a1y(t − 1) − . . . − any(t − n) + e(t))

i

Ryy(n) = −a1Ryy(n − 1) − . . . − anRyy(0) (4.45)

The equations (4.43) to (4.45) produce the following system of equations [3]:

     −1 Ryy(1) . . . Ryy(n) 0 Ryy(0) . . . Ryy(n − 1) .. . ... . .. ... 0 Ryy(n − 1) . . . Ryy(0)      | {z } RN      σ2 e a1 .. . an      | {z } θ =      −Ryy(0) −Ryy(1) .. . −Ryy(n)      | {z } fN (4.46)

We can estimate the covariances ˆRyy(k) using [3]:

ˆ Ryy(k) = 1 N N −|k|−1 X t=0 y(t)y(t + |k|) (4.47) This will give us the parameters we need to form our AR-model [3]:

y(t) = ϕT(t)θ + e(t), t = 1, 2, . . . , N (4.48) using (4.42) we can get the form [3]:

y(t) = 1

1 + a1y(t − 1) + . . . + any(t − n)

e(t) (4.49)

Numerical properties

There is however a problem solving equation (4.46) because the matrix RN is

often badly conditioned for large values of N. This will lead to large errors and the solution can be made as follows. We define [3]:

         y(1) = ϕT(1)θ + e(1) y(2) = ϕT(2)θ + e(2) .. . y(N ) = ϕT(N )θ + e(N ) (4.50)

Introduce the N × 1-vectors YN, EN and the N × n-matrix ΦN as [3]:

YN =      y(1) y(2) .. . y(N )      , EN =      e(1) e(2) .. . e(N )      , ΦN =      ϕT(1) ϕT(2) .. . ϕT(N )      (4.51) which renders [3]:

(45)

4.6 Parametric estimation 31

YN = Φ (4.52)

with normal equations [3]:

ΦTNΦNθ = ΦTNYN (4.53)

4.6.2

Least Square method

We have the AR(n)-model as described by (4.41):

y(t) + a1y(t − 1) + . . . + any(t − n) = e(t)

and introduce the predictor [3]:

ˆ

y(t|t − 1) = −a1y(t − 1) − . . . − any(t − n) (4.54)

From (4.42) we have [3]:

y(t) = ϕT(t)θ + e(t)

Here y(t) and ϕT(t) are measured (known data), while θ is a parameter vector that

we want to estimate. e(t) is still a (white) noise source. The expression is a linear regression model and the Least Square method can be applied for estimating ˆθ.

From the predictor (4.54) we recognize [3]:

ˆ

y(t|t − 1, θ) = ϕT(t)θ (4.55) The θ can be omitted, but it is written in order to show that the signal depends on the parameter vector. We define the residual as [3]:

(t, θ) = y(t) − ˆy(t|t − 1, θ) = y(t) − ϕT(t)θ (4.56) The Least Square method minimizes the loss funcion [3]:

VN(θ) = 1 N N X t=1  y(t) − ϕT(t)θ 2 = 1 N N X t=1 (t, θ)2 (4.57) The estimation of our parameter vector becomes [3]:

ˆ

θN = argmin θ

VN(θ) (4.58)

where argmin is the argument that minimizes the following expression. Since

VN(θ) is square with respect to θ in (4.57), we can explicitly form an expression

(46)

32 Theory VN(θ) = 1 N N X t=1 y2(t) − 2θT 1 N N X t=1 ϕ(t)y(t) + θTh1 N N X t=1 ϕ(t)ϕT(t)iθ = 1 N N X t=1 y2(t) +hθ − R−1N fN iT RN h θ − R−1N fN i − fT NR −1 N fN (4.59) where RN = 1 N N X t=1 ϕ(t)ϕT(t), fN = 1 N N X t=1 ϕ(t)y(t) (4.60) Since RN is a positive definite matrix we can from (4.59) see that VN is minimized

by [3]: RNθ = fN (4.61) ˆ θN = R−1N fN = hXN t=1 ϕ(t)ϕT(t)i −1XN t=1 ϕ(t)y(t) (4.62) Equation (4.61) is usually known as the normal equation for the Least Square problem and is a central point in the method. We have also seen it before in (4.46).

Element (i,j) in RN is:

1 N N X t=1 y(t − i)y(t − j) (4.63)

Row k of vector fN is:

1 N N X t=1 y(t − k)y(t) (4.64) Clarification of the steps (4.59) to (4.61). To minimize VN(θ) we set the derivative d(ϕTθ

= 0 and from that obtain ˆθ.

Definition 4.1 Derivative convention

df (x) dx =    df dx1 .. . df dxN    d(ϕTθ) =     d(ϕTθ) 1 .. . d(ϕTθ) dθN     =    ϕ1 .. . ϕ1   = ϕ dVN(θ) = 1 N N X t=1 ϕ(t)y(t) − ϕT(t)θ= 1 N N X t=1 ϕ(t)ϕT(t) | {z } RN θ + N X t=1 ϕ(t)y(t) | {z } fN (4.65)

(47)

4.6 Parametric estimation 33

dVN(θ)

= 0 ⇒ RN

ˆ

θN = fN ⇒ ˆθN = R−1N fN , as seen in (4.62)

4.6.3

Recursive Least Square method

From (4.57) we have the loss function as [3]:

VN(θ) = 1 N N X t=1  y(t) − ϕT(t)θ 2

In the case that θ varies with t, we can emphasize the more current data by introducing a forgetting factor λ [3]:

Vt(θ) = t X k=1  y(k)ϕT(k)θ 2 λt−k (4.66) where 0 < λ < 1.

In the same way as in (4.62), (4.66) is minimized by [3]: ˆ θ(t) = R−1λ (t)fλ(t) (4.67) where Rλ(t) = t X k=1 λt−kϕ(k)ϕT(k), fλ(t) = t X k=1 λt−kϕ(k)y(k)

These can be calculated recursively as [3]:

Rλ(t) = λRλ(t − 1) + ϕ(t)ϕT(t)

fλ(t) = λfλ(t − 1) + ϕ(t)y(t)

(4.68) We can now calculate [3]:

ˆ θ(t) =Rλ−1(t)fλ(t) = R−1λ (t) h λfλ(t − 1) + ϕ(t)y(t) i =Rλ−1(t)hλRλ(t − 1)ˆθ(t − 1) + ϕ(t)y(t) i =Rλ−1(t)  λh1 λ(Rλ(t) − ϕ(t)ϕ T(t))i ˆθ(t − 1) + ϕ(t)y(t)  =ˆθ(t − 1) + R−1λ ϕ(t)hy(t) − ϕT(t)ˆθ(t − 1)i

This is recursive, but matrix inversion is needed, so to speed things up in numerical applications, we can use the matrix inversion lemma [3].

Lemma 4.1 Matrix inversion

(48)

34 Theory

Introduce the following: P (t) = R−1λ (t)  A = λRλ(t − 1) B = ϕ(t)  C = I D = ϕT(t)

Where I is the identity matrix. This gives the equation: P (t) = Rλ−1(t) = 1 λ n P (t−1)−P (t−1)ϕ(t)hϕT(t)P (t−1)ϕ(t)+λi −1 ϕT(t)P (t−1)o (4.70)

Using (4.67) and (4.68) we can get the equation as:

ˆ θ(t) = P (t)fλ(t) = P (t)  λfλ(t − 1) + ϕ(t)y(t)  = P (t)λRλ(t − 1)ˆθ(t − 1) + ϕ(t)y(t)  = P (t)hRλ(t) − ϕ(t)ϕT(t)i ˆθ(t − 1) + ϕ(t)y(t)  = ˆθ(t − 1) + P (t)ϕ(t) | {z } K(t)  y(t) − ϕT(t)ˆθ(t − 1) | {z } λ(t)

So the recursive least square algorithm finally appears as [3]:          ˆ θ = θ(t − 1) + K(t)ˆ λ(t) K(t) = P (t)ϕ(t) λ(t) = y(t) − ϕT(t)ˆθ(t − 1) P (t) = 1λnP (t − 1) − P (t − 1)ϕ(t)hϕT(t)P (t − 1)ϕ(t) + λi−1ϕ(t)P (t − 1)o (4.71)

4.6.4

Comparison algorithms

Simple distance measurement

The distance between each point in the extracted frequency to the database is calculated as: dist(k) = N X i | ˆf (k + i) − f (i)| (4.72)

Normalized cross correlation

Cross correlation can be expressed as:

CC(x, y) = E[(x − ¯x)(y − ¯y)], x = E[x], ¯¯ y = E[y] (4.73) Normalized cross correlation is given by:

N CC(x, y) =CC(x, y)

σ(x)σ(y), σ(x) = E(x

2

(49)

4.6 Parametric estimation 35

(4.74) however contains discontinuities and it is ineffective for large data, so it needs to be rewritten. The discrete implementation of the normalized cross corre-lation is optimized for speed [1, 10].

The squared euclidian distance between the database signal and the extracted signal can be expressed as [4]:

d2f,t(u, v) = X x,y [f (x, y) − t(x − u, y − v)]2= X x,y [f2(x, y) − 2f (x, y)t(x − u, y − v) + t2(x − u, y − v)] (4.75) The term P x,yt 2(x − u, y − v) is constant and if P x,yf (x, y)t(x − u, y − v) is

approximately constant, the remaining term will be the cross correlation [4]:

c(u, v) =X

x,y

f (x, y)t(x − y, y − v) (4.76) The drawbacks of using (4.76) as a search method are:

• If the signal energy P f2(x, y) varies over time, the match can fail. For

example, a correlation between an actual match can become much smaller than it should be.

• The equation is sensitive, since it is not invariant to changes in amplitude.

If the extracted frequency matches the database, but has an offset of 5 Hz, there will be no match.

As a workaround to avoid these problems, the normalized cross correlation is used as [1]: γ(u, v) = P x,y[f (x, y) − ¯fu,v][t(x − u, y − v) − ¯t] q P

x,y[f (x, y) − ¯fu,v]2Px,y[t(x − u, y − v) − ¯t]2

(4.77)

Here ¯t is the mean value of the extracted signal and ¯fu,v is the mean value of

f (x, y) within the range of the database. (4.77) is computationally complex but it

is possible to rewrite it to speed it up. It is possible to calculate the numerator in the frequency plane, using the FFT [4]. It assumes that the mean value is taken out: f0(x, y) = f (x, y) − fu,v0 and t0(x, y) = t(x, y) − ¯t0. Then it is possible to use

the Fourier transform as:

F−1nF (f0)F∗(t0)o (4.78) To calculate the the denominator, it is possible to use cumulative sums. The part that is problematic to calculate is P

x,y[f (x, y) − ¯fu,v]2. The mean value and

local energy must be calculated for each u and v which becomes computationally complex and the following is proposed [4]:

(50)

36 Theory

s2(u, v) = f2(u, v) + s2(u − 1, v) + s2(u, v − 1) − s2(u − 1, v − 1) (4.80) where s(u, v) = s2(u, v) = 0, when u < 0 or v < 0. The energy at position u, v then becomes:

ef(u, v) =s2(u + N − 1, v + N − 1)

− s2(u − 1, v + N − 1)

− s2(u + N − 1, v − 1)

+ s2(u − 1, v − 1) (4.81) While the theory is based on two-dimensional computations, the implementation of the algorithm is done in one dimension.

(51)

Chapter 5

Implementation

This chapter describes the implementation of the different algorithms and filters presented in Chapter 4. A short list is presented on how to use the different algorithms. The code is presented in appendix B.

5.1

Implemented algorithms

5.1.1

locatefreqs

usage: locatefreqs(soundfile, filtertype, f1, f2) (see table 5.1).

Locatefreqs is used to check a sound file as a first step to locate any existing ENF. There is no output to the MATLABTM workspace, but the function produces multiple plots that show the signal, its Fourier transform and a spectrogram before and after filtering (Fig. C.1 to C.2). Also a filter response such as Fig. C.3 is shown.

5.1.2

STFT

usage: [fout] = est_stft(soundfile, filtertype, f1, f2) (see table 5.2).

Short Time Fourier Transform estimation method (Chapter 4.5.1) is used in the

Variable Type Restriction Default value soundfile input .wav file -filtertype input filtertype1 ’remez’

f1 input < f2 48

f2 input > f1 52

Table 5.1: Locatefreqs inputs and outputs.

1filtertype: one of ’remez’, ’ellip’, ’butter’, ’cheby1’ or ’cheby2’

(52)

38 Implementation

Variable Type Restriction Default value soundfile input .wav file -filtertype input filtertype1 ’remez’

f1 input < f2 48

f2 input > f1 52

fout output -

-Table 5.2: est_stft inputs and outputs.

1filtertype: one of ’remez’, ’ellip’, ’butter’, ’cheby1’ or ’cheby2’

Variable Type Restriction Default value soundfile input .wav file -filtertype input filtertype1 ’remez’

f1 input < f2 48

f2 input > f1 52

aliasing input 0/1 0

fout output -

-Table 5.3: est_aryule inputs and outputs.

1filtertype: one of ’remez’, ’ellip’, ’butter’, ’cheby1’ or ’cheby2’

est_stft function. It estimates frequency data from a sound file and saves frequency

data to MATLABTM workspace or to a file on disk.

5.1.3

Yule Walker

usage: [fout] = est_aryule(soundfile, filtertype, f1, f2, aliasing) (see table 5.3). Yule Walker estimation method (Chapter 4.6.1) is used in the est_aryule func-tion. It estimates frequency data from a sound file and saves frequency data to MATLABTM workspace or to a file on disk.

5.1.4

Recursive Least Square

usage: [fout] = est_arrls(soundfile, filtertype, f1, f2, aliasing) (see table 5.4). Recursive Least Square estimation method (Chapter 4.6.3) is used in the est_arrls function. It estimates frequency data from a sound file and saves frequency data to MATLABTM workspace or to a file on disk.

5.1.5

Database matching

usage: searchdb(database, soundcsv, method) (see table 5.5).

Database matching is done in the searchdb function. It finds matches between extracted sound data and a database, using different search methods.

(53)

5.2 Filters implementation 39

Variable Type Restriction Default value soundfile input .wav file -filtertype input filtertype1 ’remez’

f1 input < f2 48

f2 input > f1 52

aliasing input 0/1 0

fout output -

-Table 5.4: est_arrls inputs and outputs.

1filtertype: one of ’remez’, ’ellip’, ’butter’, ’cheby1’ or ’cheby2’

Variable Type Restriction Default value database input .db file -soundcsv input .csv file

-method input method1 ’xcorr’

Table 5.5: searchdb inputs and outputs.

1method: ’dist’ or ’xcorr’

5.2

Filters implementation

usage: [b a y] = f_filtfunc(x, fs, f1, f2, ftype) (see table 5.6).

The different filters implemented (Chapter 4.3) share the same features. The user is not required to call the filter functions as they are called and used within other functions. The filter function takes a signal, a sample frequency, the frequencies to filter and filter type. It returns the numerator and denominator to a system H(z) as well as the filtered signal. Fig. C.3 to Fig. C.7 show the filter magnitude and phase responses for respective filters default values when called from locatefreqs.m.

Variable Type Restriction Default value x input data vector

-fs input > f2

-f1 input < f2 48

f2 input > f1 52

ftype input ftype1 ’bandpass’

b output -

-a output -

-y output -

-Table 5.6: filter types inputs and outputs.

References

Related documents

As an example, during Prohibition, those members of the elite society that shared the same structures, and cultural practices, and attended the Cocktail Parties also valued the

I think we found a good mixture - microphones, studio, reverberation and music energy that all gives a feeling of a live concert in the listener’s living room.. Through this

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Key words: affect, becoming, Maurice Blanchot, displacement, dispossession, essay film, experimental video, geological time, liminal, Clarice Lispector, materialism, primordial

For a better reconstruction of the conversation regarding who says what to whom at what time, researchers make a sketch of the room where the recording takes place and number

This feature of a frequency- dependent time window is also central when the wavelet transform is used to estimate a time-varying spectrum.. 3 Non-parametric

 Jag  önskade  dock   att  organisera  och  utforma  de  musikaliska  idéerna  så  att  de  istället  för  att  ta  ut  varandra  bidrog   till  att

Suc- cessful DIY recording requires the right equipment, productive (not neces- sarily good) relationships, and a skilled technician to achieve good results. The