Institutionen för systemteknik
Department of Electrical Engineering
Examensarbete
Determining recording time of digital
soundrecordings using the ENF criterion
Examensarbete utfört i Elektroniksystem vid Tekniska högskolan i Linköping
av
Fredrik Andersson
LiTH-ISY-EX--09/4250--SE
Linköping 2009
Department of Electrical Engineering Linköpings tekniska högskola
Linköpings universitet Linköpings universitet
Determining recording time of digital
soundrecordings using the ENF criterion
Examensarbete utfört i Elektroniksystem
vid Tekniska högskolan i Linköping
av
Fredrik Andersson
LiTH-ISY-EX--09/4250--SE
Handledare: Håkan Johansson isy, Linköpings universitet Peter Bergström
SKL Erik Achrén
SKL
Examinator: Håkan Johansson isy, Linköpings universitet
Avdelning, Institution
Division, Department
Division of Electronics Systems Department of Electrical Engineering Linköpings universitet
SE-581 83 Linköping, Sweden
Datum Date 2009-05-12 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport
URL för elektronisk version
http://www.es.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-18231 ISBN — ISRN LiTH-ISY-EX--09/4250--SE
Serietitel och serienummer
Title of series, numbering
ISSN
—
Titel
Title
Tidsbestämning av digitala ljudinspelningar med hjälp av ENF-kriteriet Determining recording time of digital soundrecordings using the ENF criterion
Författare
Author
Fredrik Andersson
Sammanfattning
Abstract
In forensic investigations, verification of digital recordings is an important as-pect. There are numerous methods to verify authentication of recordings, but it is difficult to determine when the media was recorded. By studying the electrical network frequency, one can find a unique signature and then match the recording to this signature. By matching a recorded signal to a database, which contains all necessary information, one can find the time when the recording was made.
Nyckelord
Abstract
In forensic investigations, verification of digital recordings is an important as-pect. There are numerous methods to verify authentication of recordings, but it is difficult to determine when the media was recorded. By studying the electrical network frequency, one can find a unique signature and then match the recording to this signature. By matching a recorded signal to a database, which contains all necessary information, one can find the time when the recording was made.
Acknowledgments
I would like to thank SKL for the opportunity to write this thesis, my supervisor Erik Achrén and Peter Bergström for guiding me and letting me in when I forgot my security pass. A big thanks to Anders Nordgaard for the great help with statistics and the report. Also Per Brolund and Tobias Höglund for the questions, discussions and help regarding the project. I would like to thank my supervisor on the university, prof. Håkan Johansson, who has made himself available each week for discussions. A big thanks goes to prof. Fredrik Gustafsson for giving me good guidance and help.
Contents
1 Introduction 3 1.1 Background . . . 3 1.2 Task . . . 4 1.3 Limitations . . . 6 1.4 Target audience . . . 6 2 Methods 7 2.1 Methods in use . . . 7 2.1.1 Database recording . . . 7 2.1.2 Filters . . . 82.1.3 Frequency extracting algorithms . . . 8
2.1.4 Comparison algorithms . . . 9
2.2 Methods not in use . . . 10
2.2.1 WAV database recording . . . 10
2.2.2 Frequency extracting algorithm - Wigner transform . . . . 10
2.2.3 Utilizing the alias effect when downsampling . . . 10
3 System 11 3.1 The power grid . . . 11
3.1.1 50 Hz component . . . 12
3.2 System requirements . . . 12
3.2.1 ENF Workstations . . . 13
3.2.2 ENF recorders . . . 13
3.2.3 ENF database server . . . 13
3.3 Problems . . . 14
3.3.1 Power grid failure . . . 14
3.3.2 Multiple recordings . . . 14 3.3.3 Handheld devices . . . 14 3.3.4 UPS . . . 14 3.3.5 Compression . . . 15 4 Theory 17 4.1 Sampling . . . 17 4.2 Resampling . . . 18 4.2.1 Decimation . . . 19 ix
x Contents 4.2.2 Aliasing effect . . . 21 4.3 Filters . . . 21 4.3.1 IIR Filters . . . 22 4.3.2 FIR Filters . . . 25 4.4 Spectral analysis . . . 28 4.5 Non-parametric estimation . . . 28 4.5.1 Fourier transform . . . 28
4.5.2 Short Time Fourier Transform . . . 28
4.6 Parametric estimation . . . 29
4.6.1 Yule-Walker method . . . 29
4.6.2 Least Square method . . . 31
4.6.3 Recursive Least Square method . . . 33
4.6.4 Comparison algorithms . . . 34 5 Implementation 37 5.1 Implemented algorithms . . . 37 5.1.1 locatefreqs . . . 37 5.1.2 STFT . . . 37 5.1.3 Yule Walker . . . 38
5.1.4 Recursive Least Square . . . 38
5.1.5 Database matching . . . 38 5.2 Filters implementation . . . 39 5.3 Implementation notes . . . 40 5.3.1 Work order . . . 40 6 Conclusion 41 6.1 Simulation . . . 41
6.1.1 Monte carlo simulation . . . 41
6.1.2 Simulation model . . . 42 6.2 Result . . . 43 6.3 Conclusion . . . 44 6.4 Future work . . . 46 Bibliography 47 A Bash code 49 A.1 freq2.sh . . . 49 A.2 csvmerge.sh . . . 50 B MATLABTM code 51 B.1 locatefreqs.m . . . 51 B.2 searchdb.m . . . 53 B.3 Estimation algorithms . . . 54 B.3.1 est_stft.m . . . 54 B.3.2 est_aryule.m . . . 56 B.3.3 est_arrls.m . . . 58
Contents xi
List of Figures
1.1 Basic system description. . . 3
1.2 Plot database vs. extracted data . . . 5
2.1 Layout description. . . 8
3.1 Swedish power grid structure. . . 11
3.2 EKG example . . . 12
3.3 Basic system description. . . 12
4.1 Uniform sampling. . . 17
4.2 Spectrum of signals. . . 18
4.3 Aliasing effect. . . 20
4.4 Filter types. . . 22
4.5 FIR filter impulse response . . . 25
4.6 Impulse response, FIR filters. . . 26
5.1 csvmerge.sh work scheme. . . 40
6.1 Basic system description. . . 43
6.2 Noise variance, STFT . . . 44
6.3 Noise variance, parametric models . . . 45
C.1 Signal and FFT plot . . . 61
C.2 Surfplot . . . 62
C.3 Remez filter response . . . 62
C.4 Elliptic filter response . . . 63
C.5 Butterworth filter response . . . 63
C.6 Chebyshev I filter response . . . 64
C.7 Chebyshev II filter response . . . 64
List of Tables
2.1 Wavefile disk usage . . . 105.1 Locatefreqs inputs and outputs . . . 37
5.2 est_stft inputs and outputs . . . 38
5.3 est_aryule inputs and outputs . . . 38
5.4 est_arrls inputs and outputs . . . 39
5.5 searchdb inputs and outputs . . . 39
5.6 filter types inputs and outputs . . . 39
5.7 Part of a .csv file . . . 40
5.8 Part of a .db file . . . 40
6.1 STFT Simulation results . . . 43
2 Contents
Chapter 1
Introduction
This chapter will present a short introduction to the problem, the task and the field of use. A short example will be presented to give a scenario of how the method can be used.
1.1
Background
Analog devices, such as tape recorders, not only capture the intended signal, but also characteristics of the recorder itself. Tape recorders leave signatures from their record and erase heads which then can be used to identify the specific recorder used. It can also be seen if another recorder has been used to change the original contents and then it is possible to come to the conclusion that the recording has been manipulated. Nowadays, the recordings are increasingly made by digital devices and these do not leave physical traces in the same way as their analog counterparts. Therefore a new approach is needed to determine the authenticity of digital recordings.
The method described in this thesis is based on research made by Catalin Grigoras [2] and it involves studying the electrical network frequency (ENF). In Sweden, Svenska Kraftnät AB is responsible for keeping the electrical network frequency in the interval 50 ± 0.1 Hz. The deviation from 50 Hz depends on the current load on the electrical network - a large load makes the generators rotate
v(t) Fn(z)
s(t) + y(t)
Figure 1.1: Basic system description. v(t) is ENF, s(t) is music or speech and y(t) is the full recording.
4 Introduction
slower thus lowering the frequency and a small load will increase the frequency. These frequency deviations over time creates a unique pattern [2] and this is what makes the ENF method an interesting choice for determining date and authenticity of digital recordings. The simplified system is modeled as Fig. 1.1 where s(t) is a signal containing speech or music, v(t) is the disturbance from the electric network, with known properties modeled by Fn(z), and the resulting recorded signal is y(t).
Note that the amplitude of v(t) is significantly smaller than s(t) and it can usually not be noticed without performing any signal processing.
To give the reader some understanding of the method and its use, a short example is presented. This is not based on a real case, although the situation may exist in a similar form.
Example 1.1: Use of ENF
There are two actors in this example: person A and person B. A is a telephone salesman and B is a customer who has been offered expensive training equipment by A. It is not uncommon that telephone salesmen record conversations to use the recording as a proof that they made a sale. The situation is now that the salesman claims to have sold the equipment and that B claims to never have agreed to buy the equipment at all.
Forensic investigators are presented with a CD containing the recording made by A and can hear what appears to be an authentic recording with a sale being made. By just listening to the recording, there is no way of directly telling whether it is manipulated or not. They decide to take a closer look at the frequency contents of the recording and especially around 50 Hz (the ENF).
The investigators find one matching sequence, but the recording contains anoma-lies when the frequency is compared to the database. In Fig. 1.2 we see how the frequency contents match overall, but at some positions the sound file frequency differs from the database. Now the investigators listen to the recording once again and find that at each of the times listed above, person B can be heard saying ”yes”. This indicates that person A has inserted a ”yes” answer from another part of the conversation and placed it where appropriate to get a confirmation of the sale.
1.2
Task
The problem formulation given was:
• Examine what has been done in the area.
• Develop and implement one or several methods to extract ENF-data from
sound recordings.
• Develop and implement one or several methods to compare extracted
1.2 Task 5
Figure 1.2: Database plotted against extracted data. Note the instances where the signals differ.
6 Introduction
Much of this has already been covered by Fredrik Lindholm during his thesis [5]. It was however decided to focus on the extraction of the ENF from a recording, as this is a very important part of the process. Without a good signal, it does not matter how well the comparison algorithm performs.
1.3
Limitations
• Electrical network limited to Östergötland, Sweden.
• Digital media only. Analog recording devices require other techniques. • Requires the digital equipment to be connected directly to the electrical
network.
1.4
Target audience
• Statens Kriminaltekniska Laboratorium (SKL) and other forensical
author-ities.
• Law enforcement.
Chapter 2
Methods
The given problem formulation was quite wide in terms of what needed to be done and required to be narrowed down. From reading the previous thesis [5] and studying material regarding the structure of the problem, a more restrictive problem formulation was made.
It was decided that this thesis should:
• Improve the techniques used for filtering.
• Implement new methods to improve the extraction of frequency information
from recordings.
• Test the accuracy of frequency estimations. • Enhance any existing implementation.
The basic idea of determining when a recording was made is illustrated in Fig. 2.1. To the electrical network, there is an ENF recorder and some digital recording equipment attached. The ENF recorder saves frequencies together with time in-formation to a database which will be used as a reference. Whenever a digital recording is made, the ENF is also recorded. Through filtering and an estimation algorighm the ENF is extracted from the recording and then compared to the database.
2.1
Methods in use
2.1.1
Database recording
On SKL there is a ratemeter that is calibrated to pick up the frequency component around 50 Hz. The picked up frequencies are stored together with its time as a csv1
-file on disk using the bash script freq2.sh (A.1). The reason to use the ratemeter instead of recording a soundfile for each day is to save storage space and to save
1Comma Separated Value
8 Methods Electrical network Recording equipment Digital recording Filtering and Frequency estimation ENF Recorder ENF Database Comparison Result ? ? ? ? ? - ?
Figure 2.1: Layout description of method.
time. The ratemeter extracts frequency values straight to the database, instead of having to run a frequency-extracting algorithm on each recording. The ENF origins from physical generators and the fluctuation depends on the current load, which results in quite slow frequency changes. The ratemeter reads two values per second and that is enough to capture the fluctuations. A 24 hour period takes requires about 2.2 MB of disk space and over a year that translates into roughly 800 MB of disk usage.
2.1.2
Filters
Filtering is a wide concept and the different filters used depend heavily on the application and its use. Some filters are designed to remove unwanted distortion while some filters are designed to remove everything but a narrow frequency band. In order to properly estimate the ENF around 50 Hz, it is required to preprocess the signal and remove any unwanted signal information. The signal is therefore filtered through a bandpass filter and thus leaving the interesting frequency range to be further manipulated or investigated.
2.1.3
Frequency extracting algorithms
The frequencies in a signal need to be extracted in order to compare them to a database. Three methods have been implemented and compared.
Non-parametric estimation
Non-parametric estimation requires no prior information about the signal, such as statistical or spectral properties. This is considered an advantage, since one can apply the method and obtain a good result without knowing any certain properties of the signal itself. The Fourier transform is such an estimation form. The Fourier transform is a powerful and efficient way of estimating frequency contents of a signal. The method is common and widely used and therefore it is a natural choice to implement. The drawback is that the transform assumes a periodic input and
2.1 Methods in use 9
an infinite length of the input signal. This is however not the case when dealing with real data and the Discrete Fourier Transform (DFT) is the approximation that will be used. Another disadvantage is that the DFT is designed to work with large batches of data rather than recursively, which makes the method harder to adapt to signals where the frequency contents change over time.
Parametric estimation
Parametric estimation is another way of representing a signal and its contents. Parametric methods estimate parameters in a structure rather than an actual sig-nal or frequencies. Common model structures are Auto Regressive (AR), Moving
Average (MA) and Auto Regressive Moving Average (ARMA). These model
struc-tures have the ability to use prior information about a given signal or process and one would assume that this can be used to improve estimation accuracy and reso-lution. This thesis will focus on AR models. Two different parametric estimation algorithms are evaluated:
• Yule Walker - Estimate parameters using an estimate of covariances. • Recursive Least Square - Implementation of Least Square method with a
forgetting factor introduced.
2.1.4
Comparison algorithms
The comparison algorithms were developed during Lindholms thesis [5] and will be somewhat described in Chapter 4.
Simple distance measurement
This method requires that both frequency and fluctuation patterns match between the database and the extracted data. The extracted signal is moved across the database and for each small move, a distance is calculated. This is done until the whole database is covered and the lowest distance value is the most likely match. The advantage of this method is that it returns fewer hits, but it also requires that the extracted frequencies are aligned to the database. A static offset of 3 Hz would considerably lower the performance of this algorithm.
Normalized cross correlation
This method is not dependent of actual frequency since it only looks for matching patterns. The advantage is that there is no need to calibrate any offset, but the disadvantage is that the extracted pattern may appear multiple times in the database. Therefore a longer recording is needed to ensure a good match.
10 Methods
Days Seconds Capacity [MB]
1 86400 20.232
7 604800 141.624 31 2678400 627.192 365 31536000 7384.68
Table 2.1: Disk usage for comparison against 800 MB from the ratemeter.
2.2
Methods not in use
2.2.1
WAV database recording
One method to store the database would be to record the electrical network fre-quency through the line-in of a common sound card. As mentioned above, an algorithnm would have to be applied to extract the useful information and that takes time. Also, if recording the ENF database as a WAV PCM, 16 bit at 120 Hz sample rate, the storage size will be according to table 2.1 [2]. Another disadvan-tage is that sound cards behave differently depending on brand or chip used. A comparison between two computers with different sound cards showed a difference of 0.1 Hz when recording a pure generated 50 Hz signal.
2.2.2
Frequency extracting algorithm - Wigner transform
This method was investigated during Lindholms thesis [5] and was deemed unsuit-able due to its computational complexity. The algorithm works at order(n3) and would require too much in terms of hardware to be efficient. The method will not be further described in this thesis.2.2.3
Utilizing the alias effect when downsampling
The parametric model estimations need to consider frequency content outside the scope of the ENF which might introduce odd frequency estimations that deviate substantially from the wanted 50 Hz. A way to only consider the actual frequency range around 50 Hz is to use downsampling as well as the alias effect [6, 7]. If the Nyquist-Shannon sampling theorem (theorem 4.1) is not fulfilled, frequency information above the new sampling frequency will be aliased down into the new frequency band.
This should be possible to do, but the implementation proved to have flaws and the frequency estimations were very unstable. One reason behind this behaviour could be that there were too few samples on which to perform computations. Another explanation could be that the builtin MATLABTM code for resampling
is not accurate enough to retain the important signal information. One would probably need to implement the algorithm from base-up, carefully selecting filters and other techniques to properly achieve a good signal.
Chapter 3
System
3.1
The power grid
The frequency in the nordic frequency region1 carries the same waveform due
to electromagnetic wave propagation. The entire electric network formed by the interconnected systems, including all the sources and loads, will carry the same frequency [2]. This theoretically means that one ENF recorder is sufficient to cover all of the nordic frequency region.
This is not the case when the electric network fails due to broken wires or other conditions. Then isolated nodes can arise in these areas that are not connected to the main power grid and these are instead powered by local power stations. The network frequency will then differ from the rest of the grid, see Fig. 3.1. There is one permanently isolated node (in terms of ENF) in the swedish region of the grid, which is Gotland. The current is transported to Gotland via a DC (direct current) cable and then transformed into active current (AC). This effectively means that to have complete coverage of Sweden, it is needed to place a separate ENF recorder on Gotland as well.
1Sweden, Norway, Finland and the Danish island Zealand (sv. Själland)
Producer
Base grid
Regional grid Isolated grid
Consumer
Figure 3.1: Swedish power grid structure.
12 System 0 50 100 150 200 250 300 −4 −2 0 2 4 6 8 10 12 (a) 0 50 100 150 200 250 300 −2 0 2 4 6 8 10 (b) Figure 3.2: (a) EKG signal with noise from the electric network.
(b) Filtered EKG signal.
Recorder Recorder Recorder
ENF Database
Workstation Workstation
ENF Recorders
ENF Workstations
Figure 3.3: Basic system description.
3.1.1
50 Hz component
The use of the electric network frequency is a rather new technique in forensic investigations, but as a concept it is not. If one disconnects an electric guitar from an amplifier you can often hear a distinct humming sound which is the 50 Hz component. Some equipment, such as EKG instruments on hospitals, are very sensitive to disturbance and for these instruments it is crucial to remove the ENF in order to clearly see the heartbeat rhythm. Fig. 3.2 shows a generated EKG signal before and after filtering.
3.2
System requirements
Fig. 3.3 shows a description of the system and what is needed in terms of software and hardware will be described.
3.2 System requirements 13
3.2.1
ENF Workstations
The workstations extract signal information from a digital recording and then compare it to the ENF database. These can be placed anywhere in the country, given that a connection to the ENF database can be established. Preferably, the workstations are located in SKL facilities. The workstations at least require:
• Operative system: GNU/Linux. • SSH connection to the ENF Database. • Sufficient hardware to run MATLABTM.
• MATLABTM installed.
• MATLABTM Signal processing toolbox.
• The written MATLABTM code.
• The bash script csvmerge.sh.
3.2.2
ENF recorders
The ENF recorder is the whole system around the ratemeter that records and saves ENF information to disk. Once every day, the information should be sent to a central ENF database and stored. The recorders should be placed in facilities where physical access can be controlled, such as police stations. Preferably one or more recorders should be placed in each landscape to cover any grid failure. As minimum, atleast one recorder needs to be placed on Gotland due to the local network and one recorder anywhere else in Sweden. For continuous work, the following are required for ENF recorders:
• Operative system: GNU/Linux. • SSH connection to the ENF Database. • The bash script freq2.sh.
• Cron daemon configured to send recorded data to the server each day. • Rate meter.
• Connected to UPS. (see 3.3.4)
3.2.3
ENF database server
The server can physically be placed nearly anywhere in the country, but most suitable would be in SKL facilities. The server needs to be online all the time for the ENF recorders to upload ENF data. The structure really only serves as a fileserver without any real database properties (such as SQL etc.) so each ENF recorder should have a specific user account on the database server where the ENF recordings are saved.
14 System
• Operative system: GNU/Linux. • SSH server running.
• Connected to UPS. (see 3.3.4)
• Sufficient disk space to store ENF data. This will vary depending on how
many recorders there are in the system.
3.3
Problems
For any recording device to also record the ENF signal, the device must be con-nected to the electrical network. If it is not, the ENF will be missing in the recording and then there will be no data to compare to the database. This section will describe some of the cases where the ENF method will not work.
3.3.1
Power grid failure
As mentioned before, power grid failures can create local nodes that will produce a different ENF shape. If a recording device is picking up this local ENF, a comparison cannot be made. This problem can be mitigated by placing many ENF recorders throughout the country and covering each of these possible nodes.
3.3.2
Multiple recordings
If a sound recording has been made and the ENF is picked up, it is possible to analyze it. However, if the original recording is played and then recorded to another digital recording (like copying on an old cassette recorder), a new ENF will also be recorded ontop of the old one. This will render two ENF signals and it will be hard to determine which ENF is the original one. This case is not very likely as it is easier to just copy the recorded file itself, but it can still occur.
3.3.3
Handheld devices
If a recording has been made through a cell phone or a handheld recorder, there is no connection to the electrical network and the ENF will not be picked up. However, the ENF might be transferred from nearby power cables, light sources or other electrical equipment that is not properly shielded. The situation where ENF is recorded from a nearby source does not seem very likely, but could be further investigated.
3.3.4
UPS
Whenever an Uninterruptable Power Supply (UPS) deems the normal power sup-ply unstable or unreliable, it will disable the electrical network and supsup-ply power to the connected devices. If a recording device is connected to a UPS during a power failure, there will be no ENF recorded. There are two categories of UPS devices:
3.3 Problems 15
Online UPS: Will continuously feed devices connected to it and these devices
will never record ENF signals.
Offline UPS: Stands ready to take over power supply whenever the electrical
network fails. Devices connected to an offline UPS will record ENF as long as the electrical network is stable.
3.3.5
Compression
Often when a recording has been made, one usually want to compress it to save disk space. There are two main types of compression, lossless audio compression and lossy audio compression. An uncompressed file will contain all the necessary information and so will the lossless compression as the main idea of a lossless compression algorithm is to obtain a smaller file but still keep 100% of the infor-mation and be able to fully reconstruct the original file. In lossy compression, this requirement is taken away and the goal is instead to achieve a good compression level while keeping the useful information. Different lossy compression formats are suitable for different tasks. Mp3 is very commonly used when compressing music audio files, while the musepack is optimized for transparent compression of stereo audio.
Uncompressed formats: WAV, AIFF, AU etc.
Lossless compression formats: FLAC, Apple Lossless, MPEG-4 ALS,
Mon-key’s Audio, TTA, lossless Windows Media Audio (WMA) etc.
Lossy compression formats: Mp3, Vorbis, Musepack, ATRAC, lossy Windows
Media Audio (WMA), AAC etc.
It is currently unknown what the different lossy compression algorithms do to the 50 Hz component, but most likely it will be different from the original. This would need more investigation as it is very easy and common to compress an audio file to mp3.
Chapter 4
Theory
This chapter will cover the theoretical aspects of the thesis and can be difficult to grasp for the reader who does not have a background in signal processing or similar, so it is recommended to skip the whole chapter unless you are interested in the details and theory.
4.1
Sampling
Sampling is usually done to convert analog signals into a digital representation. This is not needed in this application, but it is explained to give some insight in discrete time resampling. Illustrated in Fig. 4.1, the discrete time signal x[n] is created from the continuous time signal x(t) with the uniform sampling interval
T . The relationship between these can be expressed as: [9]
x[n] = x(t)|t=nT = xa(nT ) (4.1)
If X[Ω] is the frequency spectrum of the discrete time signal and X[ω] is the frequency spectrum of the continuous time signal, the relationship between these two can be described by the Poisson summation formula: [9]
X[Ω] = 1 T ∞ X k=−∞ Xa Ω − 2πk T (4.2) xa(t)b - b ? t = nT b - b x[n] Figure 4.1: Uniform sampling.
18 Theory
From (4.2) we can also conclude: [9] 1* X(ω)|ω=Ω/T is calculated for all Ω.
2* The result in 1* will be scaled by a factor T1.
3* The result in 2* will is repeated an infinite number of times with period 2π. 4* The terms in 3* will be added.
As an illustration of this, Fig. 4.2 shows the spectrum of a signal in continuous and discrete time. From these conclusions we can realize the Nyquist-Shannon sampling theorem [8], which nowadays is central to signal processing:
-ω 6 T T T T Xa(ω) A −fs/2 fs/2 (a) -Ω 6 T T T T X[Ω] A T T T T T −2π T T T T −4π T T T T 2π T T T T 4π (b)
Figure 4.2: (a) Spectrum of continuous time signal. (b) Spectrum of sampled discrete time signal.
Theorem 4.1 (Nyquist-Shannon sampling theorem) If a function f(t)
con-tains no frequencies higher than W cps, it is completely determined by giving its ordinates at a series of points spaced 1
2W seconds apart.
Which in other words means: A sampled signal can be fully reconstructed if the following condition is met:
fs> 2f0 (4.3)
4.2
Resampling
The first issue to consider when manipulating signals is the sampling frequency. The sample frequency, fs, will determine the amount of samples per second and a
4.2 Resampling 19
computer has limited resources it is desirable to reduce the amount of samples to decrease the amount of computations. This is commonly done through decimation.
4.2.1
Decimation
Decimation is done in two steps. First the discrete time signal is filtered through a discrete time lowpass filter and then it is downsampled. A downsampling of x[n] with a factor N means that a new signal, xd[n], is created by taking every N:th
sample from x[n]. Other samples are neglected. This can be written as [9]:
xd[n] = x[nN ] (4.4)
Sampling of a discrete time signal can be done by multiplying the signal with a pulse train, p[n], as [9]: xp[n] = x[n]p[n] = x[n] ∞ X k=−∞ δ[n − kN ] (4.5) The Fourier tranform of the downsampled signal can be expressed as [9]:
Xp[Ω] = 1 N N −1 X k=0 XhΩ − k2π N i (4.6) (4.6) can be said to be the sampling in discrete time equivalent of the Poisson summation formula (4.2). It implies a scaling with factor 1/N and the spectra for
X[Ω] is repeated N times with the distance 2π/N , where N is the sample period,
that is the distance between the samples in x[n] that creates xp[n].
To avoid distorsion at this new sampling frequency, 2π/N needs to be larger than 2Ωm, where Ωm is the bandwidth for X[Ω]. If this condition is not met,
the different repetitions of X[Ω] will overlap and cause distorsion. If 2π/N is written as Ωs, ”the sample frequency”, we then need the following for distiorsion
free sampling [9]:
Ωs> 2Ωm (4.7)
This can be said to represent the discrete time version of the Nyquist-Shannon sampling theorem. The filter that precedes the downsampling needs to fulfil the requirement (4.7).
xd[n], that is acquired through decimation of x[n], can also be constructed by
decimating the sampled signal, xp[n]: [9]:
xd[n] = x[nN ] = xp[nN ] (4.8)
The Fourier transform of the decimated signal can then be written as [9]:
Xd[Ω] = Xp hΩ N i = 1 N N −1 X k=0 hΩ N − k 2π N i (4.9) Fig. 4.3 shows an example of X[Ω], Xp[Ω] and Xd[Ω] for N = 3.
20 Theory -Ω 6 X[Ω] A L L L L L −Ωm Ωm −π π L L L L L −2π L L L L L 2π (a) -Ω 6 Xp[Ω] A N L L L L L −Ωm Ωm −π π L L L L L −22π N L L L L L −2π N L L L L L 2π N L L L L L 22π N L L L L L −2π L L L L L 2π (b) -Ω 6 Xd[Ω] A N l l l l l , , , , , −N Ωm N Ωm −π π l l l l l , , −2π l l , , , , , 2π (c) -Ω 6 Xp[Ω] A N −π π L L LL LL LL LL LL LL LL LL LL LL LL LL LL L L LL LL LL LL LL LL LL LL LL LL LL LL LL L L LL LL LL LL LL LL L L L L L L L L L L L L −2π −22π N −2πN −Ωm Ωm 2πN 22πN 2π (d)
Figure 4.3: (a) X[Ω] = F {x[n]}. X[Ω] has the bandwidth Ωm.
(b) Xp[Ω] = F {xp[n]}. xp[n] sampled version of x[n],
sample frequency: 2π/N > 2Ωm. N = 3.
(c) Xd[Ω] = F {xd[n]}. X[Ω] has bandwidth Ωm.
4.3 Filters 21
4.2.2
Aliasing effect
As seen in Fig. 4.3(d), the frequencies above the nyquist frequency are folded down into a lower frequency band. It is possible to use this to achieve even lower sampling frequency.
Example 4.1: Downsampling with aliasing effect
For simplicity, assume a signal x with sampling frequency 240 Hz and say that the interesting frequency band is between 30 and 60 Hz.
1. xds1= LP (0.5, x), lowpass filter with normalized cut-off frequency 0.5.
2. Downsample xds1 by a factor 2.
3. xds2= HP (0.5, xds1), highpass filter with normalized cut-off frequency 0.5.
4. Downsample xds2 by a factor 2.
5. xds= (−1)kxds2(k) to reverse spectra.
In 1) the frequency content above 60 Hz is removed and in 2) the sample rate is decimated to fs= 120Hz. In 3) all the information below 30 Hz is removed and
then in 4) during the decimation, the interval 30-60 Hz is aliased backwards to 0-30 Hz. This means that the frequencies are folded back according to:
faliased= fs− fold (4.10)
meaning that the 60 Hz component is folded down to 0, 50 Hz component folded down to 10 and the 30 Hz component stays at 30 Hz. To remedy the reversed spectra, the periodic behaviour of the DFT can be used by frequency shifting the DFT with half the sample frequency.
Xds(f ) = Xds2(f −
fs
2 (4.11)
where Xds2(f ) is the DFT of xds2 and fs is the new sample frequency. In 5) the
frequency shift is implemented in the time domain as a sign conversion for every other sample:
xds(n) = (−1)nxds2(n) (4.12)
the frequency then translates as ftrue= fds+ 30.
4.3
Filters
Digital filters can be categorized into FIR (Finite-length Impulse Response) and IIR (Infinite-length Impulse Response) filters. FIR filters have the nice property that they can guarantee stability and to have a linear-phase response [11]. FIR
22 Theory Amax Amin ωcωs ω A(ω) (a) Amax Amin ωsωc ω A(ω) (b) Amax Amin ωs1ωc1 ωc2 ωs2 ω A(ω) (c) Amax Amin ωc1ωs1 ωs2 ωc2 ω A(ω) (d)
Figure 4.4: The four different filter types: (a) Lowpass filter, (b) Highpass filter, (c) Bandpass filter, (d) Bandstop filter
filters however usually require much higher orders than the respective IIR filter and they often introduce large group delay that makes them unsuitable for many applications. This does not matter much for this particular application as it is a purely offline system (no new data is introduced or required during runtime). The IIR filters have been implemented for speed and testing purposes but are not recommended for use because of the nonlinear phase property which can be seen in appendix C.4 through C.7.
The four types of filters considered are (see also Fig. 4.4):
• Lowpass filter (LP) • Highpass filter (HP) • Bandpass filter (BP) • Bandstop filter (BS)
4.3.1
IIR Filters
A common and easy way to design digital filter is to design an analog reference filter and then perform a transformation to turn the analog tranfer function into the z-domain. It is not possible to preserve the exact properties from the analog to the digital domain and there are many available mappings, such as bilinear,
4.3 Filters 23
impulse-invariant, step-invariant etc, but in practice only the bilinear transforma-tion is appropriate to use for frequency selective filters [11] and it will be the only transformation presented. The design procedure is to:
1* Map the magnitude response requirements of the digital filter on to equiva-lent requirements of an analog filter.
2* Solve the approximation problem for the analog filter, which give the analog lowpass transfer function H(s)
3* The poles and zeros of H(s) are mapped onto the the digital transfer function
H(z)
4* The digital lowpass filter is transformed into the required filter if needed (HP, BS, BP)
Bilinear transformation
The bilinear transformation is defined:
s = 2 T
z − 1
z + 1 (4.13)
The analog magnitude response requirements to digital equivalents can from (4.13) be derived as [11]: ωac= 2 T tan ωcT 2 (4.14)
The approximation problem is solved for the lowpass filter to obtain the analog transfer function H(s) (the interested reader can find more in [11] on how to do this). The phase response is distorted from using the bilinear transformation. Finally, from (4.13), the poles and zeros from the analog filter are mapped to the digital domain by [11]:
z = 1 +
sT
2
1 −sT2 (4.15)
Lowpass to highpass transformation
A digital lowpass filter is transformed into a highpass filter using [11]:
Z−1→ − z −1+ α 1 − αz−1, α = − cosΩcT +ωcT 2 cosωcT −ΩcT 2 (4.16)
24 Theory
Lowpass to bandpass transformation
A digital lowpass filter is transformed into a bandpass filter using [11]:
Z−1→ z −2− βz−1+ γ γz−2− βz−1+ 1 (4.17) β = 2αk k + 1, γ = k − 1 k + 1, α = cosωc1T +ωc2T 2 cosωc2T −ωc1T 2 , k = cotωc2T − ωc1T 2 tan ΩcT 2 A special case is when the bandpass filter is symmetric around π/2, that is: ωc1T +
ωc2T = π. In this case the transformation is simplified into:
Z−1→ −z−2 (4.18)
Lowpass to bandstop transformation
A digital lowpass filter is transformed into a bandstop filter using [11]:
Z−1→ z −2− βz−1+ γ γz−2− βz−1+ 1 (4.19) β = 2αk k + 1, γ = k − 1 k + 1, α = cosωc1T +ωc2T 2 cosωc2T −ωc1T 2 , k = tanωc2T + ωc1T 2 tan ΩcT 2 A special case is when the bandstop filter is symmetric around π/2. In this case the transformation is simplified into:
Z−1 → z−2 (4.20)
Standard approximations
The following standard approximations are all available in MATLABTM signal processing toolbox and they utilize the bilinear transformation (4.13). A short list of their properties are listed.
Butterworth filters:
• Monotonically decreasing magnitude response. • All zeros at s = ∞.
• Highest filter order.
Chebyshev-I filters:
• Equiripple passband.
• Monotonically decreasing in the stopband • All zeros at s = ∞.
• Filter order between Butterworth and Cauer.
4.3 Filters 25 - n 6 h(n) r r r r r r r r r r r N r r r r
Figure 4.5: Impulse response of a causal FIR filter of order N (length N + 1).
• Equiripple stopband.
• Monotonically decreasing in the passband. • Finite zeros.
• Same order as Chebyshev-I.
Cauer (elliptic) filters:
• Equiripple passband and stopband. • Finite zeros.
• Lowest filter order.
4.3.2
FIR Filters
For a causal FIR filter of order N (length N + 1), the impulse response is nonzero for 0 ≤ n ≤ N and zero for all other values of n [11]. This is shown in Fig. 4.5. The transfer function (4.21) and frequency response (4.22) can be written as [11]:
H(z) = N X n=0 h(n)z−n (4.21) H(ejωT) = N X n=0 h(n)e−jωT n (4.22) As mentioned previously, FIR filters can guarantee linear-phase response. These filters are called linear-phase FIR filters and their impulse response are either symmetric (4.23) or anti-symmetric (4.24). A linear-phase FIR filter is obtained by [11]:
h(n) = h(N − n), n = 0, 1, . . . , N. (4.23)
h(n) = −h(N − n), n = 0, 1, . . . , N. (4.24) Also the order N can be either even or odd, so there are four different types of linear-phase FIR filters denoted as Type I, II, III and IV.
Type I: h(n) = h(N − n), N even
Type II: h(n) = h(N − n), N odd
Type III: h(n) = -h(N − n), N even
Type IV: h(n) = -h(N − n), N odd
26 Theory - n 6 h(n) r r r r r r r N
(a) Type I, N even
- n 6
h(n)
r r r r r r
N
(b) Type II, N odd
- n 6 h(n) r r r r r r r N
(c) Type III, N even
- n 6 h(n) r r r r r r N
(d) Type IV, N odd
Figure 4.6: Impulse responses for the four types of linear-phase FIR filters.
Fig. 4.6 shows examples of impulse responses for the different filter types. Note that the centre value h(N/2) is always zero for Type III filters.
Frequency response
To design a linear-phase FIR filter it is easier to express its frequency response using a real function HR(ωT ) (also called the zero-phase frequency respose of H(z)
as [11]:
H(ejωT) = ejΘ(ωT )HR(ωT ) (4.26)
The magnitude response of H(ejωT) is equal to the magnitude of H
R(ωT ) as:
|H(ejωT)| = |H
R(ωT )| (4.27)
The real function HR(ωT ) can of course take negative values, whereas the function
|H(ejωT)| always is non-negative. The function Θ(ωT ) in (4.26) is then given by:
Θ(ωT ) = −N ωT
2 + c (4.28)
where c = 0 for filters with a symmetric impulse response (Type I, II) and c = π/2 for filters with antisymmetric impulse response (Type III, IV). Since HR(ωT ) is
a real function, its phase response is either 0 or ±π [11]. The phase response of
H(z) is then linear as we can see in:
Φ(ωT ) = −N ωT 2 + c, HR(ωT ) ≥ 0 −N ωT 2 + c ± π, HR(ωT ) < 0 (4.29) From (4.29) it can also be seen that the group delay of linear-phase FIR filters is
4.3 Filters 27
FIR filter design using optimization
The specification of a lowpass filter is commonly given as [11]: 1 − δc ≤ |H(ejωT)| ≤ 1 + δc, ωT ∈ [0, ωcT ]
|H(ejωT)| ≤ δ
s, ωT ∈ [ωsT, π] (4.30)
where δc, δs, ωcT and ωsT denote the passband ripple, stopband ripple, pasband
edge and stopband edge respectively. For linear-phase FIR filters, the specification of (4.30) can be restated with the aid of HR(ωT ) according to [11]:
1 − δc≤ |HR(ωT )| ≤ 1 + δc, ωT ∈ [0, ωcT ]
−δs≤ |HR(ωT )| ≤ δs, ωT ∈ [ωsT, π] (4.31)
The acceptable deviations in the passbands and stopbands can alternatively be expressed in terms of maximal allowable deviation in the attenuation in the pass-band and the minimum attenuation in the stoppass-band respectively. The variation of the attenuation in the passband is [11]:
Amax= 20 log
1 + δc 1 − δc
≈ 17.31δc [dB] (4.32)
and the minimum attenuation in the stopband is [11]:
Amin= 20 log
1 + δc
δs
[dB] (4.33)
An accurate order estimate, especially for short filters, is used in MATLABTM in the function firpmord. The required order for a linear-phase lowpass FIR filter is:
N ≈ D∞(δc, δs) − F (δc, δs)(∆ωT /2π)
2
∆ωT /2π (4.34)
where
D∞(δc, δs) = [a1(log(δc))2+ a2log(δc) + a3] log(δs) + a4(log(δc))2+ a6log(δc) + a6
F (δc, δs) = b1+ b2(log(δc) − log(δs))
∆ωT = ωsT − ωcT
The constants are [11]: a1 = 0.0005309, a2 = 0.07114, a3 = −0.4761, a4 =
−0.00266, a5= −0.5941, a6= −0.4278, b1= 11.01217, and b2= 0.51255.
Approximation problem
The McClellan-Parks-Rabiner’s design is a commonly used optimization technique to obtain a linear-phase FIR filter. The algorithm finds the unique set of filter coefficients that minimizes a weighted error function and the interested reader can find more information about the algorithm in [11].
28 Theory
4.4
Spectral analysis
In order to find any useful information in a digital sound recording, it is needed to analyze its frequency content. A useful tool is to estimate the power spectrum and there are several methods to do so. This thesis will cover two different methods of spectral estimation, non-parametric estimation 4.5 and parametric estimation 4.6.
4.5
Non-parametric estimation
4.5.1
Fourier transform
The most common method is to use the Fast Fourier Transform (FFT) which is an implementation of the Discrete Fourier Transform (DFT). It is not uncommon that the DFT is referred to as FFT which suggests the widely spread use of the algorithm [3]. The DFT, X(n), and its inverse, x(k), for a signal with a finite length of N is defined as:
X(n) = N −1 X k=0 x(k)e−i2πnN k (4.35) x(k) = 1 N N −1 X n=0 X(n)ei2πkN n (4.36) k, n = 0, . . . , N − 1
The power of the signal is calculated as the square of the transform. With a sample frequency fs, X(n) will be defined on the frequency points f (n) = fsn/N . The
frequency we want to estimate is assumed to have the highest power, so we can estimate our frequency ˆf using:
ˆ n = argmax n=[0,N −1] |X(n)|2 (4.37) ˆ f = f (ˆn) = fs ˆ n N (4.38)
4.5.2
Short Time Fourier Transform
What we have done so far is merely a DFT and selecting one frequency with the greatest power. This will only find one single frequency over the whole length of the signal. In order to acquire several frequencies from different times in the recording, we need to introduce a window that we move along the signal and do the steps (4.35) to (4.38) on each windowed segment. The STFT is defined as [3]:
X(ω, k) =
∞
X
m=−∞
4.6 Parametric estimation 29
where m is the discrete time variable. Usually n is defined as n = LN where n is the window length and the amount of overlap between adjacent windows depends on L. A wide window will provide us with many samples on which to perform the FFT but that will reduce the resolution in time. A window that is short will have a good resolution in time, but the number of FFT points will decrease and will reduce frequency resolution instead. The window length implemented has been found through testing different lengths.
4.6
Parametric estimation
The focus is to accurately estimate frequencies over a number of time instants rather than reconstructing the original signal. For an AR model, the location of the frequency only depends on the estimated parameters. In equation (4.42), the parameter vector θ is what is to be used to acquire the estimated frequency. In sections 4.6.1 to 4.6.3 the following will be used to transform the estimated parameters into an estimated frequency:
ˆ f =arccos − ˆ a1 2√aˆ2 2πTs (4.40) An n-order AR-model is described by [3]:
y(t) + a1y(t − 1) + · · · + any(t − n) = e(t) ⇒
y(t) = −a1y(t − 1) − · · · − any(t − n) + e(t)
(4.41) where e(t) is white gaussian noise with variance λe.
A common representation of the model can be written as [3]: ϕ(t) = (−y(t − 1) . . . − y(t − n))T θ = (a1. . . an)T y(t) = ϕT(t)θ + e(t) (4.42)
4.6.1
Yule-Walker method
We have the AR(n)-model as described by (4.41).
Lag 0: Multiply both sides with y(t) and take expectance [3]:
Ehy2(t)i= Ehy(t)(−a1y(t − 1) − . . . − any(t − n) + e(t))
i
⇔ Ryy(0) = −a1Ryy(1) − . . . − anRyy(n) + σe2 (4.43)
Lag 1: Multiply both sides with y(t-1) and take expectance [3]:
Ehy(t)y(t − 1)i= Ehy(t − 1)(−a1y(t − 1) − . . . − any(t − n) + e(t))
i
⇔ Ryy(1) = −a1Ryy(0) − . . . − anRyy(n − 1) (4.44)
.. .
30 Theory
Lag n: Multiply both sides with y(t-n) and take expectance [3]:
Ehy(t)y(t − n)i= Ehy(t − n)(−a1y(t − 1) − . . . − any(t − n) + e(t))
i
⇔ Ryy(n) = −a1Ryy(n − 1) − . . . − anRyy(0) (4.45)
The equations (4.43) to (4.45) produce the following system of equations [3]:
−1 Ryy(1) . . . Ryy(n) 0 Ryy(0) . . . Ryy(n − 1) .. . ... . .. ... 0 Ryy(n − 1) . . . Ryy(0) | {z } RN σ2 e a1 .. . an | {z } θ = −Ryy(0) −Ryy(1) .. . −Ryy(n) | {z } fN (4.46)
We can estimate the covariances ˆRyy(k) using [3]:
ˆ Ryy(k) = 1 N N −|k|−1 X t=0 y(t)y(t + |k|) (4.47) This will give us the parameters we need to form our AR-model [3]:
y(t) = ϕT(t)θ + e(t), t = 1, 2, . . . , N (4.48) using (4.42) we can get the form [3]:
y(t) = 1
1 + a1y(t − 1) + . . . + any(t − n)
e(t) (4.49)
Numerical properties
There is however a problem solving equation (4.46) because the matrix RN is
often badly conditioned for large values of N. This will lead to large errors and the solution can be made as follows. We define [3]:
y(1) = ϕT(1)θ + e(1) y(2) = ϕT(2)θ + e(2) .. . y(N ) = ϕT(N )θ + e(N ) (4.50)
Introduce the N × 1-vectors YN, EN and the N × n-matrix ΦN as [3]:
YN = y(1) y(2) .. . y(N ) , EN = e(1) e(2) .. . e(N ) , ΦN = ϕT(1) ϕT(2) .. . ϕT(N ) (4.51) which renders [3]:
4.6 Parametric estimation 31
YN = ΦNθ (4.52)
with normal equations [3]:
ΦTNΦNθ = ΦTNYN (4.53)
4.6.2
Least Square method
We have the AR(n)-model as described by (4.41):
y(t) + a1y(t − 1) + . . . + any(t − n) = e(t)
and introduce the predictor [3]:
ˆ
y(t|t − 1) = −a1y(t − 1) − . . . − any(t − n) (4.54)
From (4.42) we have [3]:
y(t) = ϕT(t)θ + e(t)
Here y(t) and ϕT(t) are measured (known data), while θ is a parameter vector that
we want to estimate. e(t) is still a (white) noise source. The expression is a linear regression model and the Least Square method can be applied for estimating ˆθ.
From the predictor (4.54) we recognize [3]:
ˆ
y(t|t − 1, θ) = ϕT(t)θ (4.55) The θ can be omitted, but it is written in order to show that the signal depends on the parameter vector. We define the residual as [3]:
(t, θ) = y(t) − ˆy(t|t − 1, θ) = y(t) − ϕT(t)θ (4.56) The Least Square method minimizes the loss funcion [3]:
VN(θ) = 1 N N X t=1 y(t) − ϕT(t)θ 2 = 1 N N X t=1 (t, θ)2 (4.57) The estimation of our parameter vector becomes [3]:
ˆ
θN = argmin θ
VN(θ) (4.58)
where argmin is the argument that minimizes the following expression. Since
VN(θ) is square with respect to θ in (4.57), we can explicitly form an expression
32 Theory VN(θ) = 1 N N X t=1 y2(t) − 2θT 1 N N X t=1 ϕ(t)y(t) + θTh1 N N X t=1 ϕ(t)ϕT(t)iθ = 1 N N X t=1 y2(t) +hθ − R−1N fN iT RN h θ − R−1N fN i − fT NR −1 N fN (4.59) where RN = 1 N N X t=1 ϕ(t)ϕT(t), fN = 1 N N X t=1 ϕ(t)y(t) (4.60) Since RN is a positive definite matrix we can from (4.59) see that VN is minimized
by [3]: RNθ = fN (4.61) ˆ θN = R−1N fN = hXN t=1 ϕ(t)ϕT(t)i −1XN t=1 ϕ(t)y(t) (4.62) Equation (4.61) is usually known as the normal equation for the Least Square problem and is a central point in the method. We have also seen it before in (4.46).
Element (i,j) in RN is:
1 N N X t=1 y(t − i)y(t − j) (4.63)
Row k of vector fN is: −
1 N N X t=1 y(t − k)y(t) (4.64) Clarification of the steps (4.59) to (4.61). To minimize VN(θ) we set the derivative d(ϕTθ
dθ = 0 and from that obtain ˆθ.
Definition 4.1 Derivative convention
df (x) dx = df dx1 .. . df dxN ⇒ d(ϕTθ) dθ = d(ϕTθ) dθ1 .. . d(ϕTθ) dθN = ϕ1 .. . ϕ1 = ϕ dVN(θ) dθ = 1 N N X t=1 ϕ(t)y(t) − ϕT(t)θ= − 1 N N X t=1 ϕ(t)ϕT(t) | {z } RN θ + N X t=1 ϕ(t)y(t) | {z } fN (4.65)
4.6 Parametric estimation 33
dVN(θ)
dθ = 0 ⇒ RN
ˆ
θN = fN ⇒ ˆθN = R−1N fN , as seen in (4.62)
4.6.3
Recursive Least Square method
From (4.57) we have the loss function as [3]:VN(θ) = 1 N N X t=1 y(t) − ϕT(t)θ 2
In the case that θ varies with t, we can emphasize the more current data by introducing a forgetting factor λ [3]:
Vt(θ) = t X k=1 y(k)ϕT(k)θ 2 λt−k (4.66) where 0 < λ < 1.
In the same way as in (4.62), (4.66) is minimized by [3]: ˆ θ(t) = R−1λ (t)fλ(t) (4.67) where Rλ(t) = t X k=1 λt−kϕ(k)ϕT(k), fλ(t) = t X k=1 λt−kϕ(k)y(k)
These can be calculated recursively as [3]:
Rλ(t) = λRλ(t − 1) + ϕ(t)ϕT(t)
fλ(t) = λfλ(t − 1) + ϕ(t)y(t)
(4.68) We can now calculate [3]:
ˆ θ(t) =Rλ−1(t)fλ(t) = R−1λ (t) h λfλ(t − 1) + ϕ(t)y(t) i =Rλ−1(t)hλRλ(t − 1)ˆθ(t − 1) + ϕ(t)y(t) i =Rλ−1(t) λh1 λ(Rλ(t) − ϕ(t)ϕ T(t))i ˆθ(t − 1) + ϕ(t)y(t) =ˆθ(t − 1) + R−1λ ϕ(t)hy(t) − ϕT(t)ˆθ(t − 1)i
This is recursive, but matrix inversion is needed, so to speed things up in numerical applications, we can use the matrix inversion lemma [3].
Lemma 4.1 Matrix inversion
34 Theory
Introduce the following: P (t) = R−1λ (t) A = λRλ(t − 1) B = ϕ(t) C = I D = ϕT(t)
Where I is the identity matrix. This gives the equation: P (t) = Rλ−1(t) = 1 λ n P (t−1)−P (t−1)ϕ(t)hϕT(t)P (t−1)ϕ(t)+λi −1 ϕT(t)P (t−1)o (4.70)
Using (4.67) and (4.68) we can get the equation as:
ˆ θ(t) = P (t)fλ(t) = P (t) λfλ(t − 1) + ϕ(t)y(t) = P (t)λRλ(t − 1)ˆθ(t − 1) + ϕ(t)y(t) = P (t)hRλ(t) − ϕ(t)ϕT(t)i ˆθ(t − 1) + ϕ(t)y(t) = ˆθ(t − 1) + P (t)ϕ(t) | {z } K(t) y(t) − ϕT(t)ˆθ(t − 1) | {z } λ(t)
So the recursive least square algorithm finally appears as [3]: ˆ θ = θ(t − 1) + K(t)ˆ λ(t) K(t) = P (t)ϕ(t) λ(t) = y(t) − ϕT(t)ˆθ(t − 1) P (t) = 1λnP (t − 1) − P (t − 1)ϕ(t)hϕT(t)P (t − 1)ϕ(t) + λi−1ϕ(t)P (t − 1)o (4.71)
4.6.4
Comparison algorithms
Simple distance measurement
The distance between each point in the extracted frequency to the database is calculated as: dist(k) = N X i | ˆf (k + i) − f (i)| (4.72)
Normalized cross correlation
Cross correlation can be expressed as:
CC(x, y) = E[(x − ¯x)(y − ¯y)], x = E[x], ¯¯ y = E[y] (4.73) Normalized cross correlation is given by:
N CC(x, y) =CC(x, y)
σ(x)σ(y), σ(x) = E(x
2
4.6 Parametric estimation 35
(4.74) however contains discontinuities and it is ineffective for large data, so it needs to be rewritten. The discrete implementation of the normalized cross corre-lation is optimized for speed [1, 10].
The squared euclidian distance between the database signal and the extracted signal can be expressed as [4]:
d2f,t(u, v) = X x,y [f (x, y) − t(x − u, y − v)]2= X x,y [f2(x, y) − 2f (x, y)t(x − u, y − v) + t2(x − u, y − v)] (4.75) The term P x,yt 2(x − u, y − v) is constant and if P x,yf (x, y)t(x − u, y − v) is
approximately constant, the remaining term will be the cross correlation [4]:
c(u, v) =X
x,y
f (x, y)t(x − y, y − v) (4.76) The drawbacks of using (4.76) as a search method are:
• If the signal energy P f2(x, y) varies over time, the match can fail. For
example, a correlation between an actual match can become much smaller than it should be.
• The equation is sensitive, since it is not invariant to changes in amplitude.
If the extracted frequency matches the database, but has an offset of 5 Hz, there will be no match.
As a workaround to avoid these problems, the normalized cross correlation is used as [1]: γ(u, v) = P x,y[f (x, y) − ¯fu,v][t(x − u, y − v) − ¯t] q P
x,y[f (x, y) − ¯fu,v]2Px,y[t(x − u, y − v) − ¯t]2
(4.77)
Here ¯t is the mean value of the extracted signal and ¯fu,v is the mean value of
f (x, y) within the range of the database. (4.77) is computationally complex but it
is possible to rewrite it to speed it up. It is possible to calculate the numerator in the frequency plane, using the FFT [4]. It assumes that the mean value is taken out: f0(x, y) = f (x, y) − fu,v0 and t0(x, y) = t(x, y) − ¯t0. Then it is possible to use
the Fourier transform as:
F−1nF (f0)F∗(t0)o (4.78) To calculate the the denominator, it is possible to use cumulative sums. The part that is problematic to calculate is P
x,y[f (x, y) − ¯fu,v]2. The mean value and
local energy must be calculated for each u and v which becomes computationally complex and the following is proposed [4]:
36 Theory
s2(u, v) = f2(u, v) + s2(u − 1, v) + s2(u, v − 1) − s2(u − 1, v − 1) (4.80) where s(u, v) = s2(u, v) = 0, when u < 0 or v < 0. The energy at position u, v then becomes:
ef(u, v) =s2(u + N − 1, v + N − 1)
− s2(u − 1, v + N − 1)
− s2(u + N − 1, v − 1)
+ s2(u − 1, v − 1) (4.81) While the theory is based on two-dimensional computations, the implementation of the algorithm is done in one dimension.
Chapter 5
Implementation
This chapter describes the implementation of the different algorithms and filters presented in Chapter 4. A short list is presented on how to use the different algorithms. The code is presented in appendix B.
5.1
Implemented algorithms
5.1.1
locatefreqs
usage: locatefreqs(soundfile, filtertype, f1, f2) (see table 5.1).
Locatefreqs is used to check a sound file as a first step to locate any existing ENF. There is no output to the MATLABTM workspace, but the function produces multiple plots that show the signal, its Fourier transform and a spectrogram before and after filtering (Fig. C.1 to C.2). Also a filter response such as Fig. C.3 is shown.
5.1.2
STFT
usage: [fout] = est_stft(soundfile, filtertype, f1, f2) (see table 5.2).
Short Time Fourier Transform estimation method (Chapter 4.5.1) is used in the
Variable Type Restriction Default value soundfile input .wav file -filtertype input filtertype1 ’remez’
f1 input < f2 48
f2 input > f1 52
Table 5.1: Locatefreqs inputs and outputs.
1filtertype: one of ’remez’, ’ellip’, ’butter’, ’cheby1’ or ’cheby2’
38 Implementation
Variable Type Restriction Default value soundfile input .wav file -filtertype input filtertype1 ’remez’
f1 input < f2 48
f2 input > f1 52
fout output -
-Table 5.2: est_stft inputs and outputs.
1filtertype: one of ’remez’, ’ellip’, ’butter’, ’cheby1’ or ’cheby2’
Variable Type Restriction Default value soundfile input .wav file -filtertype input filtertype1 ’remez’
f1 input < f2 48
f2 input > f1 52
aliasing input 0/1 0
fout output -
-Table 5.3: est_aryule inputs and outputs.
1filtertype: one of ’remez’, ’ellip’, ’butter’, ’cheby1’ or ’cheby2’
est_stft function. It estimates frequency data from a sound file and saves frequency
data to MATLABTM workspace or to a file on disk.
5.1.3
Yule Walker
usage: [fout] = est_aryule(soundfile, filtertype, f1, f2, aliasing) (see table 5.3). Yule Walker estimation method (Chapter 4.6.1) is used in the est_aryule func-tion. It estimates frequency data from a sound file and saves frequency data to MATLABTM workspace or to a file on disk.
5.1.4
Recursive Least Square
usage: [fout] = est_arrls(soundfile, filtertype, f1, f2, aliasing) (see table 5.4). Recursive Least Square estimation method (Chapter 4.6.3) is used in the est_arrls function. It estimates frequency data from a sound file and saves frequency data to MATLABTM workspace or to a file on disk.
5.1.5
Database matching
usage: searchdb(database, soundcsv, method) (see table 5.5).
Database matching is done in the searchdb function. It finds matches between extracted sound data and a database, using different search methods.
5.2 Filters implementation 39
Variable Type Restriction Default value soundfile input .wav file -filtertype input filtertype1 ’remez’
f1 input < f2 48
f2 input > f1 52
aliasing input 0/1 0
fout output -
-Table 5.4: est_arrls inputs and outputs.
1filtertype: one of ’remez’, ’ellip’, ’butter’, ’cheby1’ or ’cheby2’
Variable Type Restriction Default value database input .db file -soundcsv input .csv file
-method input method1 ’xcorr’
Table 5.5: searchdb inputs and outputs.
1method: ’dist’ or ’xcorr’
5.2
Filters implementation
usage: [b a y] = f_filtfunc(x, fs, f1, f2, ftype) (see table 5.6).
The different filters implemented (Chapter 4.3) share the same features. The user is not required to call the filter functions as they are called and used within other functions. The filter function takes a signal, a sample frequency, the frequencies to filter and filter type. It returns the numerator and denominator to a system H(z) as well as the filtered signal. Fig. C.3 to Fig. C.7 show the filter magnitude and phase responses for respective filters default values when called from locatefreqs.m.
Variable Type Restriction Default value x input data vector
-fs input > f2
-f1 input < f2 48
f2 input > f1 52
ftype input ftype1 ’bandpass’
b output -
-a output -
-y output -
-Table 5.6: filter types inputs and outputs.