• No results found

Speech compression and tone detection in a real-time system

N/A
N/A
Protected

Academic year: 2021

Share "Speech compression and tone detection in a real-time system"

Copied!
55
0
0

Loading.... (view fulltext now)

Full text

(1)2004:003 CIV. MASTER'S THESIS Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering Department of Computer Science and Electrical Engineering Division of Signal Processing. 2004:003 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--04/003--SE.

(2) Speech Compression and Tone Detection in a Real-Time System Kristina Berglund. Master’s Thesis Division of Signal Processing.

(3)

(4) To Andreas.

(5)

(6) Abstract During the night, encrypted spoken newspapers are broadcasted over the terrestrial FM radio network. These papers can be subscribed to by persons with reading disabilities. The subscribers presently have a special receiver that decrypts and stores the newspaper on a cassette tape. A project aiming to design a new receiver, using digital technology, was started during the year 2002. This report describes the parts of the project involving speech compression and tone detection. An overview of different compression techniques, with emphasis on waveform coding, is given. A detailed description of Adaptive Differential Pulse Code Modulation (ADPCM), the compression technique chosen for implementation on a Digital Signal Processor (DSP) is also included. ADPCM was first implemented on the ADSP-2181 DSP, with a good result. In the final version of the digital receiver the ADSP-2191 DSP will be used and hence the code was converted to fit this DSP. Due to some problems this implementation could not be completed within the time frame of this thesis. The final part of this thesis consists of finding a method for detecting a tone inserted between articles in the spoken newspaper. The tone detection is composed of two parts, the first part is reducing the amplitude of the speech while maintaining the amplitude of the tone. For this part a digital resonator was chosen and implemented both in Matlab and on the ADSP-2191 DSP. The second part of the tone detection consists of deciding whether the tone is present or not, this part was implemented only in Matlab..

(7)

(8) Preface This Master’s thesis is the final work for my Master of Science degree. The work for this thesis was performed at the Division of Signal Processing at Lule˚ a University of Technology. It was performed as part of a project, aiming to design a digital receiver for a radio transmitted encrypted spoken newspaper system. The main purposes of this thesis are to investigate various compression algorithms and to find a detection technique that can be used to detect tones inserted between articles in the spoken newspaper. I would like to thank my examiner James P. LeBlanc for giving me the opportunity of working in this project and for all support and inputs along the way. I would also like to thank my project leader Per Johansson and my fellow colleagues Anders Larsson and Patrik P¨a¨aj¨arvi for their invaluable help and encouraging words.. Kristina Berglund Lule˚ a, January 2004.

(9)

(10) Contents 1 Introduction 1.1 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1 1. 2 Basics 2.1 Sampling . . . . . . . . . . . 2.2 Quantization . . . . . . . . 2.2.1 Scalar Quantization . 2.2.2 Vector Quantization. . . . .. 5 5 6 7 9. . . . . . . . . . . .. 11 12 12 13 14 14 15 16 16 17 18 18. . . . .. 21 22 26 27 27. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 3 Speech Compression 3.1 Waveform Coding . . . . . . . . . . . . . . . . . . . . 3.1.1 Pulse Code Modulation . . . . . . . . . . . . . 3.1.2 Differential Pulse Code Modulation . . . . . . 3.1.3 Adaptive Differential Pulse Code Modulation 3.1.4 Subband Coding . . . . . . . . . . . . . . . . 3.1.5 Transform Coding . . . . . . . . . . . . . . . 3.2 Vocoders . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Linear Predictive Coding . . . . . . . . . . . . 3.3 Hybrid Coding . . . . . . . . . . . . . . . . . . . . . 3.3.1 Code Excited Linear Prediction . . . . . . . . 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 4 Adaptive Differential Pulse Code Modulation 4.1 The Encoder . . . . . . . . . . . . . . . . . . . . 4.2 The Decoder . . . . . . . . . . . . . . . . . . . . 4.3 Implementation and Results . . . . . . . . . . . 4.3.1 Used Tools . . . . . . . . . . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. 5 Tone Detection 29 5.1 Finding the Tone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1.1 Matched Filter . . . . . . . . . . . . . . . . . . . . . . . . . 31.

(11) Contents. 5.2. 5.1.2 Digital Resonator . . . . . . . . . . . . . . . . . . . . . . . . Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32 35 36. 6 Conclusions. 41. Bibliography. 44.

(12) Chapter 1 Introduction This Master’s thesis is part of a project involving myself, the M.Sc. students Anders Larsson and Patrik P¨a¨aj¨arvi and our project leader Per Johansson at Lule˚ a University of Technology. The aim of the project is to design a digital receiver for a radio transmitted encrypted newspaper system.. 1.1. Project Overview. The radio transmitted papers are spoken versions of daily newspapers, which reading disabled persons can subscribe to. The newspapers are distributed during the night between 2 a.m. and 5 a.m., on the radio channel SR P1. Different papers are broadcasted in different regions of the country, today about 90 different daily newspapers have a spoken version broadcasted each night [1]. Since every newspaper is limited to 90 minutes and one paper is sent in mono on each channel, a maximum of four newspapers can be broadcasted in one region during the night. To prevent non-subscribers from listening to the paper, it is encrypted. To receive the newspaper, the subscribers have a receiver set to record one of the papers transmitted. The receiver first decrypts the newspaper and then stores it on a cassette tape. In order to listen to the paper, the subscribers must insert the tape in a regular cassette player. Between the articles in the paper a tone is inserted. When the listeners fast forward the tape this tone is heard, indicating the start of a new article. 1.

(13) 2. Chapter 1. Introduction. The project of designing a digital receiver was initiated in 2002 by Taltidningsn¨amnden, a government authority whose purpose is to improve the availability of spoken newspapers to the reading disabled. The reasons for designing a new receiver are mainly questions of cost and adaptability. The receiver of today uses analog technology and have high costs of maintenance and repair. Often the subscribers to spoken newspapers are elderly, with difficulties in handling a cassette tape. Since the new receiver have a built-in speaker, nothing needs to be moved in order to listen to the paper. An advantage of the digital receiver is the ability to skip between articles by pressing a button. For this reason, the tones inserted between articles must be detected. Another advantage is the fact that additional features, for example improvements of the sound quality, easily can be added to the digital receiver by a change of software. Below, a brief description of the digital receiver is given. As seen in Fig. 1.1, the transmission is received by an FM-receiver and analogto-digital converted, before passed along to the digital signal processor (DSP). In the DSP, the encrypted newspaper is decrypted, searched for tones between the articles, compressed and then stored on a memory chip connected to the DSP. During playback, the process is reversed. The newspaper is read from the memory, decompressed, digital-to-analog converted and sent out through a speaker. In order to decrypt the newspaper, the incoming transmission is sampled at 19 kHz. Each sample is represented in the DSP with an accuracy of 16 bits. Since one newspaper is 90 minutes long, about 205 MB worth of memory is required for storing the paper if no compression technique is used. If this receiver is to be commercialized, the costs must be kept small. Since the cost of the memory is higher the larger the memory is, a small memory is desired. To fit the newspaper on a small memory, it must be compressed. The parts of the project described in this thesis are compression/decompression and tone detection..

(14) Chapter 1. Introduction. Encrypted transmission. 3. DSP FM. A/D. Decryption. Tone detection. Compression. Write to memory. Memory. Read from memory. D/A. Decompression. Figure 1.1: Simple block diagram of the digital receiver. The data flow during recording and playback are indicated by the black and white arrows respectively..

(15) 4. Chapter 1. Introduction.

(16) Chapter 2 Basics In this section, two basic concepts needed for the understanding of digital compression of speech waveforms are given.. 2.1. Sampling. Only discrete-time signals can be digitally processed, time continuous signals have to be sampled. Sampling is merely taking the value of the signal at discrete points, equally spaced in time, see Fig. 2.1. The discrete-time version of a time continuous signal sc (t), is sd [n] = sc (nT ), (2.1) where n is an integer and T is the sampling period, the time between each sample. The sampling frequency, fs , is how often a signal is sampled, the relationship between the sampling period and the sampling frequency is described by. fs =. 1 . T. (2.2). A bandlimited signal s(t), i.e. a signal with no frequency components higher than some known limit, is uniquely determined by its samples s[n] if the sampling frequency is high enough. The sampling theorem states that the sampling frequency must be at least twice as high as the maximum frequency component in the signal [2]. That is, fs > 2 · fmax , (2.3) 5.

(17) 6. Chapter 2. Basics. Amplitude. Original signal Sampled signal. Time. Figure 2.1: Sampling of a time continuous signal.. where fmax is the maximum frequency, for s(t) to be uniquely determined and reconstructible. The frequency 2 · fmax is called the Nyquist rate. If the sampling frequency is below the Nyquist rate, aliasing may occur. Aliasing means that a frequency is misrepresented, the reconstructed signal will have a lower frequency than the original signal, as illustrated in Fig. 2.2. Aliasing can be avoided by bandlimiting the signal before it is sampled. This is done by applying a lowpass filter to the signal prior to sampling. The lowpass filter used must remove the frequency components that are higher than fs /2.. 2.2. Quantization. Every sampled signal is quantized, to be encodable with a finite number of bits. Quantization can be described as rounding off the value of the signal to the nearest value from a discrete set, see Fig. 2.3. The quantization can be performed on raw speech samples as well as on residuals, for example the difference between consecutive samples. If the quantization of one sample depends on previous samples, it is said to have memory. Quantization can be performed on one sample at a time, known as scalar quantization, or on several samples at a time, known as vector quantization..

(18) Chapter 2. Basics. 7. 1. Amplitude. 0.5 0 −0.5 −1 0. 0.2. 0.4 0.6 Time (s). 0.8. 1. Figure 2.2: Original (dotted line) and reconstructed (solid line) signal, sampling frequency below the Nyquist rate.. Output value. Input value. Figure 2.3: Input(dotted line) and output (solid line) from a 3-bit (eight levels) uniform quantizer.. 2.2.1. Scalar Quantization. For all quantization methods some distortion is introduced. For scalar quantization this distortion is called quantization noise, which is measured as the difference between the input and the output of the quantizer. How well a quantizer works depends on the step size. The step size is the measure between two adjacent quantization levels. Uniform quantizers have the same step size between all levels, unlike log quantizers (see Section 3.1.1). Log quantizers have a step size that varies between different quantization levels. For uniform as well as log quantizers, the step size must be kept small, to minimize quantization noise. There is a tradeoff.

(19) 8. Chapter 2. Basics. between how small the step size is and how many quantization levels are required. A way of keeping the quantization noise small is to allow the step size to vary from one time to another, depending on the amplitude of the input signal. This is called adaptive quantization and it means that the step size is increased in regions with a high variance in amplitude and decreased in regions with a low variance. The step size adaption can be based on future samples, called forward adaptive quantization or on previous samples, called backward adaptive quantization. Adaptive quantization adds delay and additional storage requirements, but enhances the quality of the reconstructed signal. Forward adaptive quantization is shown in Fig. 2.4 . Samples, including the current sample, are stored in a buffer and a statistical analysis is performed on them. Based on this analysis the step size is adjusted and the quantization is carried out. Statistical parameters Input signal. Buffering. Statistical analysis. Quantization encoder. Interval index. Quantization Reconstructed decoder signal. Channel. Figure 2.4: Forward adaptive quantization (based on a figure from [3]).. Since the analysis is performed on samples not available at the receiver, the statistical parameters from the analysis must be sent as side information. There is a tradeoff between how sensitive the quantizer is to local variations in the signal and how often side information must be sent. If the buffer size is small, the adaption to local changes will be effective, but then the side information must be sent very often. If many samples are stored in the buffer, the side information does not have to be sent that often, however the adaption might miss local variations. The more samples the buffer stores, the bigger the delay and the storage requirements are. No side information has to be sent using backward adaptive quantization. The buffering and the statistical analysis is carried out by both the transmitter and the receiver, see Fig. 2.5. Since the analysis is performed on the output of the quantizer and this contains quantization noise, the adaption to the variation of the signal is not as fine as it is when using forward adaptive quantization. Some well-known methods that use scalar quantization are Pulse Code Modulation, Delta Modulation and Adaptive Differential Pulse Code Modulation, they are all described in Section 3.1..

(20) Chapter 2. Basics. Input signal. 9. Quantization encoder. Interval index. Quantization decoder. Buffering. Buffering. Statistical analysis. Statistical analysis. Reconstructed signal. Channel. Figure 2.5: Backward adaptive quantization (based on a figure from [3]).. 2.2.2. Vector Quantization. In scalar quantization the input to the quantizer is just one sample. The input to a vector quantizer is a vector consisting of several consecutive samples. The samples can be both pure speech samples as well as prediction residuals or some coding parameters. The general idea of vector quantization is that a group of samples can be more efficiently encoded together than one by one. Due to the inherent complexity of vector quantization, significant results within this area were not reported until the late 1970’s [4]. In vector quantization, depicted in Fig. 2.6, the quantization consists of finding a codeword that resembles the input vector. Each incoming vector, si , is compared to codewords in a codebook. The closest word, often measured in mean squared error, is selected for transmission. Channel. Codebook Input vector. Encoder. Codebook Index. Index. Decoder. Reconstructed vector. Figure 2.6: Block diagram of vector quantizer (based on a figure from [4]).. For a codebook that consists of L different codewords, the M-dimensional vector space is divided into L non-overlapping cells. Each cell has a corresponding centroid, these are the L codewords. This is shown for a two-dimensional case (M=2) in Fig. 2.7. The incoming vector falls into a cell, Cn , and the centroid, sˆn , of the cell is the codeword chosen for transmission. Actually, what is transmitted is the index, un of the chosen word, not the codeword itself. The decoding process is the.

(21) 10. Chapter 2. Basics. encoding reversed. The codebook, the same as the encoder has, is searched to find the codeword matching the index. When finding a match, the decoded word is the centroid corresponding to the index in question. Vector quantization is often used in hybrid coders, described in Section 3.3. n2. . .. . . . . cell centroid, s . . . . . .. . . . . . .. .. . n . . . . . ^. n. 1. Figure 2.7: Cells for two-dimensional vector quantization (based on a figure from [5])..

(22) Chapter 3. Speech Compression. Speech compression, or speech coding, is needed for many applications. One application is telecommunication, where a low bit rate often is aimed at. Another is storage, where speech needs to be compressed to fit within a specific memory size. The goal of speech coding is to encode the signal using as few bits as possible and still have a sufficient quality of the reconstructed speech. Sufficient quality can mean a variety of different things, from speech with no artifacts and no difficulty in recognizing the speaker, to nearly non-intelligible unnatural speech, all depending on the application. Another important issue is delay, in real-time systems the delay introduced by compression/decompression must be kept small. In this project the aim has been to find a low complexity compression algorithm that reconstructs speech with high quality. Low complexity is the same as needing few instructions to encode and decode the speech.. The field of speech coding can be divided into three major parts or classes, waveform coding, vocoding and hybrid coding. Hybrid coding is not really a separate class, more a mixture of vocoding and waveform coding. In the following sections an overview of some basic compression techniques is given, and in Section 3.4 examples of the quality and complexity for algorithms from each class are given and discussed. 11.

(23) 12. 3.1. Chapter 3. Speech Compression. Waveform Coding. Waveform coders compress the speech signal without any consideration in how the waveform is generated. Waveform coding can be carried out both in the time domain and in the frequency domain. Pulse Code Modulation, Delta Modulation and Adaptive Differential Pulse Code Modulation are all examples of time domain waveform coding, while Subband Coding and Adaptive Transform Coding operate in the frequency domain. The aim is to construct a signal with a waveform that resembles the original signal. This means that waveform coders also work fairly well for non-speech signals. Waveform coders generally produce speech with a high quality, unfortunately with quite a high bit rate as well. For more information on waveform coders than given in this thesis, the reader is referred to [3] and [6].. 3.1.1. Pulse Code Modulation. Pulse Code Modulation (PCM) is the simplest form of scalar quantization. PCM uses uniform or logarithmic quantization. The samples are rounded off to the nearest value from a discrete set. For uniform PCM the set consists of a number of equally spaced discrete values, the step size between each quantization level is constant. The waveform is approximated by quantizing the input speech samples before transmission. PCM produces speech with a high quality, however, it also requires a very high bit rate. PCM can be made more efficient using a non-uniform step size. This type of quantizer is called log quantizer. The difference between a uniform quantizer and a log quantizer is shown in Fig. 3.1. A-law and µ-law are both examples of log . Output value. Output value. Input value. (a). Input value. (b). Figure 3.1: (a) Uniform quantizer; (b) Log quantizer..

(24) Chapter 3. Speech Compression. 13. quantizers. These two techniques are widely used in many speech applications, µ-law is used in telephone networks in North America and in Japan and A-law is used in Europe [7]. The performance of a 12-bit uniform quantizer can be achieved by a 7-bit log quantizer [4].. 3.1.2. Differential Pulse Code Modulation. Since successive speech samples are highly correlated, the difference between adjacent samples generally has a smaller variance than the original signal. This means that the signal can be encoded using fewer bits when encoding the difference than when encoding the original signal and still achieve the same performance. This is the objective of Differential Pulse Code Modulation (DPCM). DPCM quantizes the difference between adjacent samples, instead of the original signal itself. Many DPCM schemes also use a short-time predictor to further decrease the bit rate, an example of this is Delta Modulation.. Delta Modulation Fig. 3.2 shows a Delta Modulator. Based on previous samples the current sample s[k] is estimated to se [k]. The prediction residual, e[k], is the difference between the input sample and an estimate of the input sample. This difference is then quantized and transmitted. The transmitter and the receiver uses the same predictor P. At the receiver the prediction residual is added to se [k] to reconstruct the speech sample to sˆ[k]. s[k]. +. Σ. e[k]. e q [k]. Q. −. se [k]. ^s[k]. Σ +. + +. e q [k] +. Σ. se [k] ^s[k]. P. P Figure 3.2: Block diagram of a delta modulation encoder (to the left) and decoder (to the right) (based on figures from [8]).. The simplest form of the procedure described above is the first-order, one-bit linear delta modulation. The estimate in this method is based on one previous quantized.

(25) 14. Chapter 3. Speech Compression. sample and uses only one bit to encode the difference. This means that the following sample can only be smaller or bigger than the previously encoded sample. Hence, adjacent samples always differ with the step size from each other. If adjacent input samples have the same value, they will be encoded differently. This error is known as granular noise. When using a fixed step size like this method does, another error, called slope overload can occur. This can happen when the sample-to-sample change of the signal is greater or smaller than the step size. Both granular noise and slope overload is depicted in Fig. 3.3. This method can be used for highly oversampled speech, where the correlation is strong. original samples coded samples. 1. Amplitude. 0.8. Slope overload. Granular noise. 0.6. 0.4. Step size 0.2. 0 0. 2. 4. 6. 8. 10. 12. 14. 16. Sample number. Figure 3.3: Two types of quantization errors (based on a figure from [5]).. 3.1.3. Adaptive Differential Pulse Code Modulation. Adaptive Differential Pulse Code Modulation (ADPCM) is an extension of Delta Modulation. ADPCM has both an adaptive step size and an adaptive prediction. A detailed description of the ITU-T standard G.726 ”40, 32, 24, 16 kbit/s adaptive differential pulse code modulation (ADPCM)” is found in Chapter 4.. 3.1.4. Subband Coding. A subband coder divides the input signal into several frequency subbands, which then are individually encoded as depicted in Fig. 3.4. The encoding process works as follows. Using a bank of bandpass filters, subband coders divide the frequency.

(26) Chapter 3. Speech Compression. 15. band of the signal into subbands. Each subband is then downsampled to its Nyquist rate (see Section 2.1) and encoded, using for example one of the techniques described above. The downsampling can be done since the bandwidth is narrower for the subbands than for the input signal. Before transmission, the signals are multiplexed. Multiplexing means combining multiple signals in order to send them over one single channel. At the receiving end, the signals are demultiplexed, decoded and modulated back to their original spectral bands before they are added together to reproduce the speech. To reduce the number of bits needed to encode the signals, different number of bits are used for different subbands. For speech, most of the energy is within the frequencies between 120 and 2000 Hz. Hence, more bits are often allotted to lower frequency bands than to higher. Downsampling. Bandpass filter M. Encoder M. Encoder. Decoder 1. Bandpass filter 1. Decoder 2. Bandpass filter 2. .... Multiplexer. Encoder 2. .... signal. Bandpass filter 2. .... Input. Demultiplexer. Encoder 1. filter 1. .... Bandpass. Decoder M. Channel. Output signal. Bandpass filter M. Decoder. Figure 3.4: Block diagram of general subband coder (based on a figure from [6]).. 3.1.5. Transform Coding. In Fig. 3.5, a simple block diagram of a transform coder is shown. A block of speech samples are unitary transformed to the frequency domain by the encoder. The transform coefficients are quantized and encoded before transmission. The receiver decodes and inverse-transforms the coefficients to reproduce the block of speech samples. The objective of transforming the speech is that the transformed signal is less correlated than the original signal. This means that most of the signal energy is contained in a small subset of transform coefficients. Karhunen-Lo´eve Transform (KLT), Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) are three examples of transforms that are used by transform coders. KLT is the transform that gives the least correlated coefficients [9]. However, DCT is more.

(27) 16. Chapter 3. Speech Compression. popular since it is nearly optimal and easy to compute using the Fast Fourier Transform. As for subband coders, bit allocation is used to remove redundancies. Adaptive Transform Coding (ATC) works in a similar way as described above, except it uses adaptive quantization and adaptive bit allocation. This leads to a higher quality of the reconstructed speech. x. T. X. T. Encoder. Receiver. Channel. −1. ~ X. Decoder. ~ x. Transmitter. Figure 3.5: Block diagram of a transform coder (based on a figure from [4]).. 3.2. Vocoders. Unlike waveform coders, vocoders (parametric coders) do not try to recreate the speech waveform. Vocoders use a mathematical model based on how the human speech is generated, to create synthesized speech instead of a reconstruction of the input. By an analysis of the input speech, parameters corresponding to the vocal cords and the vocal tract are estimated and quantized before transmission. At the receiver, these parameters are used to tune the model in order to construct synthesized speech. Due to the fact that vocoders operate by synthesizing speech they perform poorly on non-speech signals. Vocoders use very few bits to encode the speech, giving a very synthetic speech where the speaker is hard to identify. For more information on vocoding techniques than given below, the reader is referred to [4] and [5].. 3.2.1. Linear Predictive Coding. A well-known example of Linear Predictive Coding (LPC) is the 2.4 kbit/s LPC-10. Due to the very low bit rate, the reconstructed speech sounds very unnatural. The main use for algorithms like this has been secure transmissions, where a big part of the bandwidth is needed for encrypting the speech. An LPC encoder divides the input signal into frames, typically 10-20 ms long. Each frame is then analyzed to estimate the parameters needed. Linear prediction parameters, the decision if the.

(28) Chapter 3. Speech Compression. 17. input is voiced or unvoiced speech, the pitch period and the gain are the parameters that are estimated and quantized. The quantized parameters are transmitted and a synthesis model, like the one shown in Fig. 3.6, is used to reconstruct the speech. For voiced speech a pulse generator is used for modelling the excitation signal, for unvoiced speech white noise is used. The excitation signal is used to represent the flow of air from the lungs through the glottis and the LPC parameters are used in the filter representing the oral cavity. The signal is multiplied by a gain to achieve the correct amplitude of the synthesized speech. The parameters are updated for each frame, i.e. every 10-20 ms. Between the updates, the speech is assumed to be stationary.. Pitch. Pulse generator. Voiced/ unvoiced. LPC parameters. Switch. Filter. Gain. Synthesized speech. Noise generator Figure 3.6: LPC synthesis model (from [5]).. 3.3. Hybrid Coding. Hybrid coders contain features from both waveform coders and vocoders. They try to combine the low bit rate of vocoders with the high speech quality of waveform coders to produce speech with good quality at a medium or low bit rate. Many hybrid coders can be classified as analysis-by-synthesis linear predictive coders, depicted in Fig. 3.7. Using this system an excitation signal is chosen or formed. The excitation signal is then filtered by a long-term linear predictive (LP) filter AL (z) corresponding to the pitch structure of the speech. Then the short-term LP-filter A(z), representing the vocal tract, is applied to the signal. This is the vocoding part of hybrid coders. The waveform part of these coders is the attempt to match the synthesized speech to the original speech. This is done by the perceptual weighting filter, W (z). W (z) is used to shape the error between the input speech s[n] and the synthesized speech sˆ[n], in order to minimize the Mean Squared Error (MSE). For more information on hybrid coding than given below, the reader is referred to [4] and [5]..

(29) 18. Chapter 3. Speech Compression s[n] Select of Form Excitation. Gain. Σ. ^ s[n]. Σ A (z). A(z). MSE. W(z). L. Σ. Figure 3.7: Analysis-by-synthesis linear predictive coder (from [4]).. 3.3.1. Code Excited Linear Prediction. Code Excited Linear Prediction, known as CELP, contains a codebook with different excitation signals. The perceptual weighting filter is applied both to the original signal and to the synthesized signal. A difference is calculated and using this, the excitation yielding the minimal error is chosen. As for the general analysisby-synthesis coder described above, both a long-term and short-term LP-filter is applied to the signal for synthesizing the speech. An example of a standard using CELP is the Federal Standard FS 1016 CELP, which is a 4.8 kbit/s hybrid coder previously used in secure communications. Another example is ITU-T standard G.728 ”Coding of speech at 16 kbit/s using low-delay code excited linear prediction”, known as LD-CELP. This standard has a much higher quality of the synthesized speech than FS 1016, but the bit rate is higher as well.. 3.4. Discussion. As mentioned in Section 1.1, it is desired to fit the newspaper on a small memory. The 90 minutes long paper is sampled at 19 kHz using 16 bits to represent each sample. Thus, the uncompressed newspaper requires 205.2 MB of memory. In order to get the subscribers to accept a new receiver the quality of the speech must be comparable to the quality of the present receiver, therefore ratings from subjective tests is a good thing to compare. Mean Opinion Score, or simply MOS, is a well-known subjective test used to determine the quality of reconstructed or synthesized speech. The scale in MOS tests range from 1 to 5 as follows:.

(30) Chapter 3. Speech Compression. 1 2 3 4 5. − − − − −. 19. Unsatisfactory (Bad) Poor Fair Good Excellent.. In the second column of Table 3.1 the result from a MOS test is given for five different compression techniques. Both the waveform coders PCM and ADPCM have high MOS scores as well as the hybrid coder LD-CELP. Many hybrid coders have a high quality of the synthesized speech, but they tend to have a very high complexity and they require a codebook. Due to these reasons a hybrid coder is not a good choice for the compression of the spoken newspaper. Since vocoders produces unnatural speech they are not considered as candidates for use in the digital receiver. The class of speech coders that is left is waveform coders. From Table 3.1, it is clear that if ADPCM is used to compress the spoken newspaper, it would be possible to store the paper on a 64 MB memory chip. For this reason and the fact that ADPCM has a high quality of the reconstructed speech and a low complexity this algorithm was chosen for the project. A detailed description of this algorithm is given in the next section. Table 3.1: Coding algorithms and their performances (based on information found in [4]). Algorithm log PCM ADPCM LPC-10 FS 1016 CELP LD-CELP a. MOS 4.3 4.1 2.3 3.2 4.0. CRa 2 4 53.33 26.67 8. MIPSb 0.01 2 7 16 19. Memory requirementc (in MB) 102.6 51.3 3.85 7.7 25.65. CR = Compression Ratio is the ratio between the number of bits needed to represent one uncompressed sample and the number of bits needed to represent one compressed sample. b MIPS = Million Instructions Per Second. These figures are only approximate, the complexity is dependent on the implementation. c The amount of memory required to store a 90 minutes long encoded spoken newspaper..

(31) 20. Chapter 3. Speech Compression.

(32) Chapter 4 Adaptive Differential Pulse Code Modulation The compression technique chosen for this project is Adaptive Differential Pulse Code Modulation (ADPCM). The reasons for choosing this particular algorithm is that the quality of the reconstructed speech is high and the complexity is low, about 600 instructions are needed to encode or decode one sample of the speech. Compressed with ADPCM, one newspaper requires approximately 51.3 MB of memory, as seen from Table 3.1. Using a 64 MB memory chip, space will be available for storing overhead information. An additional advantage is that ADPCM is free of charge [10]. ADPCM was approved as the international standard, CCITT Recommendation G.721 ”32 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM)” in October 1984, see [11]. The standard is now a part of ITU-T Recommendation G.726 ”40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM)” [12]. The version used in this project is the 32 kbit/s ADPCM. This bit rate is valid if the input speech is sampled at 8 kHz using 16 bits accuracy, but the sampling frequency used in the digital receiver is 19 kHz. With a CR of 4 (see Section 3.4) = 76 kbit/s. This will have no effect on the the output bit rate becomes 19000·16 4 calculations other than that they will be performed at a higher rate. ADPCM makes use of redundancies in speech signals, as correlation between contiguous samples, to minimize the number of bits needed to represent the signal. The difference between the incoming sample and an estimation of that sample is calculated. The difference is encoded, instead of the actual input signal itself. This reduces the variance of the signal and therefore fewer bits are required to encode 21.

(33) 22. Chapter 4. Adaptive Differential Pulse Code Modulation. it. A simplified block diagram of the ADPCM encoder/decoder is shown in Fig. 4.1. The estimate of the input signal, se [k], is subtracted from the actual input signal s[k] and the resulting difference, d[k], is quantized and encoded to I[k] before transmission. The reconstruction of the speech starts by recreating the received word to dq [k]. The difference is then added to the estimate calculated by A(z) and B(z), to produce the reconstructed speech r[k]. G.726 is backward adaptive, s[k]. d[k]. Q. I[k]. .. Q. Encoder output. −1. −. Step size adaption. dq[k]. y[k] se [k]. B(z) A(z). .. r[k] Decoder output. Figure 4.1: Simplified block diagram of ADPCM encoder/decoder (based on a figure from [11]).. meaning that the prediction of the input signal is made from previous samples. Hence, the prediction can be made at the receiver without any side information having to be sent. As shown in Fig. 4.1, the decoder can be regarded as a subset of the encoder.. 4.1. The Encoder. The encoding process starts by converting the input signal from logarithmic PCM to uniform PCM, as seen in Fig. 4.2. Next, the difference between the input signal and its estimate is calculated and this value is assigned four bits by the adaptive quantizer. This four-bit word, I[k], is the output of the encoder. I[k] is sent both to the decoder and to the inverse quantizer. The output of the inverse quantizer.

(34) Chapter 4. Adaptive Differential Pulse Code Modulation. 23. is the quantized difference signal, which is added to the estimate of the signal to reconstruct the input. This reconstructed signal, as well as the quantized difference signal, is fed to the adaptive predictor for an estimation of the next input sample. Below follows descriptions of what each block in the encoder does. Reconstructed signal calculator. ADPCM output. s r (k) s(k). Input PCM format conversion. sl (k). Difference d(k) signal computation. Adaptive quantizer. l(k) !. Inverse adaptive quantizer. dq(k) !. Adaptive predictor. !. y(k) !. a1(k). !. a2(k). !. Quantizer scale factor adaptation. se(k). Adaptation speed control. !. t r(k). td(k). Tone and transition detector. y l(k). Figure 4.2: Block diagram of ADPCM encoder (from [12]).. Input PCM Format Conversion The input to the ADPCM encoder is an A-law or µ-law pulse code modulated signal, with an accuracy of 8 bits. This log quantized signal is converted to uniform PCM before compression.. Difference Signal Computation The second step in the encoding process is to calculate the difference signal by subtracting the estimate of the input from the actual input signal:. d[k] = sl [k] − se [k].. (4.1).

(35) 24. Chapter 4. Adaptive Differential Pulse Code Modulation. Adaptive Quantizer ADPCM uses backward adaptive quantization, described in Section 2.2.1. The quantizer used is a 15-level non-uniform adaptive quantizer. Before quantizing the difference signal, it is converted to a base-2 logarithmic representation and scaled by the scale factor y[k]. The normalized input to the quantizer is then log2 |d[k]| − y[k],. (4.2). which is quantized to create the encoder output I[k].. Inverse Adaptive Quantizer The inverse adaptive quantizer constructs the quantized difference signal, dq [k], by first decoding I[k] and then adding the scale factor y[k]. Finally, the result is transformed from the logarithmic domain.. Quantizer Scale Factor Adaption The scale factor y[k], used in the quantizer and in the inverse quantizer, is composed of two parts. One part for fast varying signals and one for slowly varying signals. The fast, unlocked, scale factor is recursively calculated, using the resulting scale factor y[k]: yu [k] = (1 − 2−5 )y[k] + 2−5 W (I[k]).. (4.3). W is a number from a discrete set, depending on which one of the fifteen quantization levels is used for the current sample, see [12]. The second part of the scale factor is the slow, or locked factor yl [k] and it is calculated as follows: yl [k] = (1 − 2−6 )yl [k − 1] + 2−6 yu [k].. (4.4). The resultant scale factor y[k] is a combination of the fast and the slow scale factors: y[k] = al [k]yu [k − 1] + (1 − al [k])yl [k − 1]. (4.5).

(36) Chapter 4. Adaptive Differential Pulse Code Modulation. 25. The controlling parameter al [k] is described below.. Adaptation Speed Control al [k] is a controlling parameter that ranges between 0 and 1. For speech signals this parameter approaches 1, forcing the quantizer towards the fast mode. For slow signals like tones, the parameter tends towards 0, driving the quantizer into the slow mode. For detailed information on how al [k] is constructed, please see [12].. Adaptive Predictor and Reconstructed Signal Calculator The estimate, se [k], of the input signal is calculated using two previously reconstructed samples sr and six previous difference signals dq :. se [k] =. 2 X. ai [k − 1]sr [k − i] + sez [k].. (4.6). i=1. where. sez [k] =. 6 X. bi [k − 1]dq [k − i].. (4.7). i=1. The reconstructed signal sr is calculated by adding the quantized difference to the estimate of the signal: sr [k − i] = se [k − i] + dq [k − i].. (4.8). The coefficients ai and bi are found in [12].. Tone and Transition Detector When the quantizer is in the slow, locked, mode and a stationary signal change to another stationary signal, problems can occur. An example of this type of signal is tones from a frequency shift keying modem. The problems that might occur are prevented by the tone and transition detector. When a transition between different.

(37) 26. Chapter 4. Adaptive Differential Pulse Code Modulation. stationary signals is detected the quantizer is forced into the fast, unlocked, mode by setting all the predictor coefficients equal to zero.. 4.2. The Decoder. As can be seen in Fig. 4.3 the structure of the decoder is for the most part the same as for the encoder. I[k], the four-bit word representing the difference between the input signal and its estimate, is fed to the inverse quantizer. The quantized difference signal is added to the estimate of the input and then converted from uniform PCM to either A-law or µ-law PCM. The blocks in common for both. Figure 4.3: Block diagram of an ADPCM decoder (from [12]).. the encoder and decoder were described in Section 4.1; the blocks unique for the decoder are described below.. Output PCM Format Conversion This is the block that after reconstructing the signal converts it back to A-law or µ-law PCM format from uniform PCM..

(38) Chapter 4. Adaptive Differential Pulse Code Modulation. 27. Synchronous Coding Adjustment This block have been added to the ADPCM decoder to reduce cumulative distortion that can appear from successive synchronous tandem codings, i.e., ADPCM to PCM to ADPCM to PCM to ADPCM etc. With this feature, any number of synchronous tandem codings is equivalent to one single coding, providing an ideal channel (no transmission errors). Details about the synchronous coding adjustment is found in [12].. 4.3. Implementation and Results. ADPCM was first implemented on the ADSP-2181 DSP, described in Section 4.3.1 below, using code obtained from Analog Devices. The speech compressed using this program had a high quality, as expected. Since the sampling frequency had to be set exactly to 19 kHz in order to decrypt the newspaper, the ADSP-2181 DSP was replaced by the ADSP-2191 DSP. Both DSPs are programmed using assembly language, and they were supposed to be compatible. However, some instructions were defined differently in the new DSP, therefore a conversion of the code had to be made. Unfortunately, due to some problems, the implementation on the ADSP-2191 could not be completed within the time frame of this thesis.. 4.3.1. Used Tools. Both the ADSP-2181 DSP and the ADSP-2191 DSP were used during the implementation of ADPCM. First the evaluation kit ADSP-21XX EZ-KIT Lite was used, some of its attributes are as follows: • 16-bit fixed-point ADSP-2181 DSP • Ability to perform 33 MIPS1 • 80 kB of on-chip RAM • AD1847 stereo codec • Serial port connection 1. MIPS = Million Instructions Per Second.

(39) 28. Chapter 4. Adaptive Differential Pulse Code Modulation. Due to a need of setting the sampling frequency in steps of 1 Hz the ADSP-2181 was replaced by the ADSP-2191. The evaluation board for this DSP is called ADSP-2191 EZ-KIT Lite and have the following attributes: • 16-bit fixed-point ADSP-2191 DSP • Ability to perform 160 MIPS • 160 kB of on-chip RAM • AD1885 48kHz AC’97 SoundMAX codec • USB version 1.1 connection • Ability to set the sampling frequency in steps of 1 Hz The attributes for both evaluation kits are attained from [13]..

(40) Chapter 5 Tone Detection Between two articles in the spoken newspaper a 50 Hz tone is inserted, indicating the end of one article and the start of the next. Today the subscribers of spoken newspapers use a normal cassette player when listening to their recorded paper. When listening to the newspaper at normal speed the tone is not audible, partly because of its low frequency and partly because the amplitude of the tone is smaller than the amplitude of the speech, see Fig. 5.1. However, when the listeners fast forward the tape, the frequency of the tone is increased and it can be heard as a beep. In the new digital receiver, it is desired to skip between articles just by pressing a button and for this reason the 50 Hz tone must be detected. The idea is to add a bookmark to the location of the tone. This way pressing the forward button on the new receiver will cause a jump to the next article, that is, the place where the next bookmark is located.. 5.1. Finding the Tone. The tone is added to the newspaper by the person reading it out, this causes the duration of the tone to vary. A couple of tones have been examined and they have had durations between approximately 0.5 and 2.5 seconds. Due to the human factor, the tone is not always located between articles, sometimes it is found in the end of one article hidden under the speech. Thus, the problem can be described as detecting a known signal with unknown duration in non-white, non-stationary noise (i.e. the speech). 29.

(41) 30. Chapter 5. Tone Detection. 0.25. 0.2. 0.15. 50 Hz tones. Amplitude Amplitude. 0.1. 0.05. 0. −0.05. −0.1. 0. 0.5. 1. 1.5. 2. 2.5 3 Time (s). 3.5. 4. 4.5. 5. Time (s). Figure 5.1: Speech and 50 Hz tones.. Requirements for the tone detection:. • Due to limited memory on the DSP, the solution to the tone detection problem should not require too much memory. • The tone detection should be fast, i.e. not require many instructions. • The detection must be robust, meaning as well detecting all tones as a low probability of false alarm. If the detector fails to detect one tone and the next one is detected, skipping forward means missing an article. False alarm occurs when something other than a tone is believed to be the tone searched for.. The first step in detecting the tone is to reduce the amplitude of the speech while the amplitude of the tone is maintained or possibly increased. The second step is to decide whether the tone searched for is present in the signal or not..

(42) Chapter 5. Tone Detection. 5.1.1. 31. Matched Filter. Since matched filters can be used to search for known signals in noise it might be a good thing to use for the first step in the process of detecting the 50 Hz tone. A matched filter is designed to maximize the output Signal-to-Noise Ratio (SNR) [14]. Thus, it might be possible to use a matched filter to remove the speech. The aim of a matched filter is not to keep the waveform of the signal searched for unchanged, but to maximize the power of the known signal with respect to the power of the noise. The received signal can be described by r(t) = s(t) + n(t),. (5.1). where s(t) is the known signal searched for (in this case the 50 Hz tone) and n(t) is the additive colored noise (the speech) corrupting the tone. If S ∗ (f ) is the Fourier transform of the known signal s(t) and the Power Spectral Density (PSD) [14], of the colored input noise is Pn (f ), the transfer function of the matched filter is given by: S ∗ (f ) −j2πf t0 H(f ) = K e , (5.2) Pn (f ) where K is an arbitrary real nonzero constant and t0 is the sampling time. A proof of this is given in [14]. Usage of the filter described above requires knowledge about the PSD of the speech. Due to this a simplification is needed. If the assumption that the noise is white is made, Pn (f ) equals N0 /2. This reduces (5.2) to H(f ) =. 2K ∗ S (f )e−j2πf t0 . N0. (5.3). Hence, if the noise is white, the impulse response of the matched filter becomes h(t) = Cs(t0 − t),. (5.4). where C is an arbitrary real positive constant. The proof of this equation is found in [14]. From (5.4) it is clear that the matched filter is a scaled time-reversed version of the known signal itself, delayed by the interval t0 , see Fig. 5.2. For the matched filter to be realizable, it must be causal, i.e., h(t) = 0, if t < 0. (5.5). and for that reason t0 must be equal to the length of the known signal. The sampling frequency used in the DSP is 19 kHz, one period of the 50 Hz tone is therefore represented by 380 samples. For the matched filter to be optimal,.

(43) 32. Chapter 5. Tone Detection. the duration of the tone must be known. As stated before, the duration of the tone varies, and hence an assumption of the duration must be made. Thus, the length of the matched filter becomes the same as the number of samples needed to represent one period of the tone, multiplied by the number of periods chosen. However, this procedure requires a large amount of storage, since the same number of samples as the length of the filter must to be stored. A method for reducing the s(t). h(t). t T. (a). t t0. (b). Figure 5.2: (a) s(t) is the known signal; (b) h(t) is the matched filter.. length of the filter is to perform a downsampling of the input signal prior to the detection process, and hereby reduce the number of samples needed to represent one period of the tone. Downsampling can not be done unless a lowpass filter first is applied to the input signal, else aliasing (described in Section 2.1) might occur. For example, if it is desired to use a sampling frequency of 400 Hz, the lowpass filter must remove all frequencies above 200 Hz. If the filter is not steep enough, frequencies above 200 Hz will be represented with a lower frequency, maybe as 50 Hz. A steep lowpass filter is of high order, i.e. many samples must be stored and this diminishes the gain of the downsampling. Therefore the matched filter does not seem to fulfill the storage requirements of the tone detection. The assumption that the noise is white is not completely correct, this error together with the large storage requirement leads to the conclusion that another solution than a matched filter might be more suitable.. 5.1.2. Digital Resonator. Another possible solution to the tone detection problem is to remove all frequencies in the input signal except frequencies around 50 Hz. This can be done by applying a bandpass filter to the incoming signal. By testing if the power of the output signal is bigger than some threshold value the presence of the tone can be detected. Since most of the energy in human speech is limited to frequencies between 120 and 2000.

(44) Chapter 5. Tone Detection. 33. Hz, the filter used to remove the speech must drop of sharply. Thus, the magnitude response of the filter should look like a very steep, narrow bandpass filter, like the ideal filter shown in Fig. 5.3.. 30. 40. 50. 60. 70. 80. 90. 100. Frequency in Hz. Figure 5.3: Magnitude response of an ideal bandpass filter with the passband [49.5 50.5] Hz.. A filter known as a digital resonator [15], is a second order digital filter with complex conjugated poles in ae±jω0 where a is close to 1. ω0 is the digital resonance frequency, that is, the frequency of interest. In this case the resonance frequency is π/190, the digital frequency corresponding to 50 Hz. The digital frequency ranges between 0 and π rad/s, with π corresponding to half the sampling frequency. A pole-zero plot of a general digital resonator is shown in Fig. 5.4. 1 0.8 0.6 Imaginary Part. 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1. −0.5. 0 Real Part. 0.5. 1. Figure 5.4: Pole-zero plot of a general digital resonator. The poles and zeros are indicated by crosses and circles respectively..

(45) 34. Chapter 5. Tone Detection. To have a filter with a magnitude response resembling the one in Fig. 5.3 it should drop off sharply when ω is moving away from ω0 . This can be achieved using two zeros. By placing one zero at -1 the magnitude response will drop off sharply as ω approaches π. By placing the other zero at +1 the magnitude response will fall off sharply when ω approaches 0. The transfer function of the digital resonator is H(z) =. b0 (z + 1)(z − 1) , (z − aejω0 )(z − ae−jω0 ). (5.6). where a is defined as. 2(1 − a) = bandwidth,. (5.7). using an approximation of the 3-dB bandwidth of the filter. This approximation is derived in [15]. The magnitude response of a digital resonator with a bandwidth corresponding to 1 Hz is shown in Fig. 5.5. This magnitude response seems to correspond well with the ideal magnitude response in Fig. 5.3.. Magnitude (dB). 0. −50. −100. −150. 0. 0.05. 0.1. 0.15 0.2 0.25 0.3 0.35 Normalized Frequency (×π rad/sample). 0.4. 0.45. 0.5. 100. Phase (degrees). Figure 5.5: Magnitude response of digital resonator with passband [49.5 50.5] Hz. The 50 normalized frequency 0.5·π corresponds to 4750 Hz. 0. −50 For a digital receiver, only 4 samples need to be stored, thus the memory requirement of the tone detection given in Section 5.1 is fulfilled. The corresponding −100 0 0.1 0.3 0.5 0.6 0.7 0.8 1 magnitude response, the0.2fact that a0.4second order filter is easy to0.9implement and Normalized Frequency (×π rad/sample) the memory requirement of a digital receiver makes it a good candidate for the tone detection..

(46) Chapter 5. Tone Detection. 5.2. 35. Implementation. The digital resonator was first implemented using Matlab from MathWorks. The output from a digital resonator with a bandwidth corresponding to 1 Hz, using the same input signal as depicted in Fig. 5.1, is shown in Fig. 5.6. From this figure 0.05. 0.04. 0.03. 0.02. Amplitude Amplitude. 0.01. 0. −0.01. −0.02. −0.03. −0.04. −0.05. 0. 0.5. 1. 1.5. 2. 2.5 3 Time (s). 3.5. 4. 4.5. 5. Time (s). Figure 5.6: Output from the digital resonator with a bandwidth of 1 Hz.. it can be seen that the tone consists of four peaks instead of three as indicated in Fig. 5.1, the first peak was hidden by the speech. It is clear from the figure that the resonator efficiently removes the speech from the signal, as desired. Note that in this plot the scale of the amplitude axis is changed from that of Fig. 5.1. As mentioned in the beginning of Section 5.1.2, the power of the output signal must be observed in order to determine if the tone is present or not. A measure of the power of the signal can be achieved by squaring the output of the resonator. This is shown in Fig. 5.7(a), and in Fig. 5.7(b) the absolute value of the resonator output is shown. As can be seen from these two plots the result from squaring and the result from taking the absolute value of the resonator output are quite similar. Since the DSP, used for the real-time implementation, contains a function to fast calculate the absolute value this method was chosen. By applying a lowpass filter, after taking the absolute value of the output of the digital resonator, the result is smoothed and can be compared to a threshold value. The smoothed output is.

(47) 36. Chapter 5. Tone Detection −4. x 10. 0.01. 0.9. 0.009. 0.8. 0.008. 0.7. 0.007 Amplitude Amplitude. Amplitude Amplitude. 1. 0.6. 0.5. 0.4. 0.006. 0.005. 0.004. 0.3. 0.003. 0.2. 0.002. 0.1. 0.001. 0. 0. 0.5. 1. 1.5. 2. 2.5 3 Time (s). 3.5. 4. 4.5. 0. 5. 0. 0.5. 1. 1.5. 2. 2.5 3 Time (s). Time (s). Time (s). (a). (b). 3.5. 4. 4.5. 5. Figure 5.7: (a) Squared resonator output; (b) Absolute value of resonator output.. shown in Fig. 5.8. From this figure a value between 20 and 30 seems like a good threshold. If the threshold 30 is chosen, the tone will be detected at the time 1.3 s, and from Fig. 5.1 it is clear that the first article is not quite finished at that time. A possible solution to this problem is to wait to set the bookmark until the value of the smoothed output is below 30 again, i.e., at the time 3.8 s. Since the resonator worked well using Matlab it was implemented on the DSP. 50. 45. Amplitude Amplitude. 40. 35. 30. 25. 20. 15. 10. 5. 0. 0. 0.5. 1. 1.5. 2. 2.5 Time (s). 3. 3.5. 4. 4.5. 5. Time (s). Figure 5.8: Smoothed output of digital resonator.. 5.2.1. Results. The filter used in Fig. 5.6 was implemented on the ADSP-2191 DSP, described in Section 4.3.1. As input to the resonator a waveform generator was used and.

(48) Chapter 5. Tone Detection. 37. the output was observed using an oscilloscope. The idea was when increasing the frequency of the input, a peak in amplitude of the output would be seen. Unfortunately, the result of this implementation shared no resemblance with the result obtained using Matlab, the resonator did not respond to any frequencies. A possible reason for this was believed to be the quantization errors due to small filter coefficients. The DSP implementation required the coefficients in 1.15 format [16], 1 and since some of the filter coefficients were approximately 5000 , the quantization error amounted to roughly 12%. These quantized coefficients were used in Matlab on the same input file as in Fig. 5.6, generating the result shown in Fig. 5.9. From this plot it is obvious that the resonator implemented on the DSP can not detect any 50 Hz tone. A solution to the above mentioned problem could be using a 0.05. 0.04. 0.03. 0.02. Amplitude Amplitude. 0.01. 0. −0.01. −0.02. −0.03. −0.04. −0.05. 0. 0.5. 1. 1.5. 2. 2.5 3 Time (s). 3.5. 4. 4.5. 5. Time (s). Figure 5.9: Output of the narrow resonator using quantized coefficients.. resonator with a wider passband, this will increase the value of the coefficients and thereby reduce the quantization noise. The result obtained from a 9.6 Hz wide resonator implemented in Matlab is shown in Fig. 5.10(a). The corresponding result when using the quantized coefficients for the wider resonator in Matlab is shown in Fig. 5.10(b). As can be seen from these two plots, the resulting difference obtained when using the two sets of coefficients is small. By broadening the passband of the digital resonator the small coefficients became four times larger than they were before and the quantization error of these coefficients was reduced to approximately 0.5%..

(49) Chapter 5. Tone Detection 0.05. 0.05. 0.04. 0.04. 0.03. 0.03. 0.02. 0.02. Amplitude Amplitude. Amplitude Amplitude. 38. 0.01. 0. −0.01. 0.01. 0. −0.01. −0.02. −0.02. −0.03. −0.03. −0.04. −0.04. −0.05. 0. 0.5. 1. 1.5. 2. 2.5 3 Time (s). Time (s). (a). 3.5. 4. 4.5. 5. −0.05. 0. 0.5. 1. 1.5. 2. 2.5 3 Time (s). 3.5. 4. 4.5. 5. Time (s). (b). Figure 5.10: (a) Output of the resonator using the passband [45.2 54.8] Hz; (b) Output of the resonator using the passband [45.2 54.8] Hz with quantized coefficients.. From Fig. 5.10 it can be seen that a small part of the speech, that was removed by the 1 Hz wide resonator, is beginning to show. However, since the amplitude of the speech is much smaller than the tones this will have a very small effect. When implementing the 9.6 Hz wide resonator on the DSP, the result is very good for tones generated by the waveform generator. When the input frequency is varied from 20 Hz and upwards, a peak in amplitude is seen around 50 Hz. The amplitude of the output from the resonator is low when the frequency is low and increasing as the frequency approaches 50 Hz. When the frequency is increased further, the amplitude of the tone is decreased. This relationship can be seen in Fig. 5.11. For comparison, the magnitude response of the corresponding Matlab implementation and the 1 Hz wide resonator (from Fig. 5.5) is also plotted in the same figure. It is seen from this figure that the DSP implementation has a positive gain of approximately 4 dB when the frequency is around 50 Hz. This will cause no problem, since the amplitude of the tones is very low. The same input file as used in Fig. 5.10 was applied to the resonator implemented on the DSP. In order to see the tones, i.e. to see a change of amplitude, the input volume had to be high. If the input volume was too low no tones were seen. The main reason for needing a high input volume is the fact that the tone searched for is 50 Hz and the sampling frequency used is 19 kHz. Due to this, some of the resonator coefficients will be very small, giving a non-optimal result. An improvement would therefore be to downsample the input signal prior to the tone detection. As mentioned in Section 5.1.1, the downsampling of a signal requires a lowpass filter to avoid aliasing. Applying a lowpass filter to the signal would of.

(50) Chapter 5. Tone Detection. 39. 10 0. Magnitude (dB). −10 −20 −30 −40 −50 −60 50. 2. 10 Frequency (Hz). 3. 10. Figure 5.11: Magnitude responses of digital resonators. Solid line: DSP implementation, passband [45.2 54.8] Hz. Dashed line: Matlab implementation, passband [45.2 54.8] Hz. Dotted line: 1 Hz resonator (unquantized coefficients).. course increase the memory requirement and the number of instructions needed to carry out the tone detection, but since the digital resonator has a very small memory requirement this would still be a good thing to do in order to enhance the performance. One thing that must be done in order to complete the DSP implementation of the tone detector is to make a decision whether the 50 Hz tone is present or not. The way this is to be done was described in Section 5.2. In order to decide on a good threshold value many newspapers must be analyzed..

(51) 40. Chapter 5. Tone Detection.

(52) Chapter 6 Conclusions This thesis is part of a project aiming to design a digital receiver for a radio transmitted newspaper system. The newspapers are spoken versions of daily newspapers, subscribed to by persons with different types of reading disabilities. The spoken newspapers are encrypted to prevent non-subscribers from listening to them. Today the subscribers have a receiver that decrypts the paper and stores it on a cassette tape. When listening to the paper, the cassette tape must be moved to a regular cassette player. The digital receiver will have a built-in speaker and hence nothing has to be moved in order to listen to the paper. The newspaper will be stored on a small digital memory and for that reason the received and decrypted transmission must be compressed. In this thesis several speech compression algorithms are presented. The criteria for the compression algorithm chosen for use in the digital receiver was high quality of the reconstructed speech, low complexity and a sufficient compression ratio. The algorithm that met the criteria the best was Adaptive Differential Pulse Code modulation (ADPCM). An additional advantage of choosing ADPCM is that it is free of charge. Other candidates for the compression were different hybrid coders, which gives reconstructed speech with very high quality. However, they were ruled out since they require a codebook and they tend to have a high complexity. Between the articles in the spoken newspaper a 50 Hz tone is inserted, indicating the start of the next article. Since the frequency is low as well as the amplitude of the tone, it is not heard when listening to the paper at normal speed. The tone is heard when the listener fasts forward the tape, and thereby increase the frequency. In the digital receiver it is desired to jump between articles just by pressing a button. This can be done if the location of the 50 Hz tone is known, thus it must 41.

(53) 42. Chapter 6. Conclusions. be detected. The first step in detecting the 50 Hz tone inserted between articles in the spoken newspaper is to reduce the amplitude of the speech but not the amplitude of the tone. As shown in the previous chapter, the use of a digital resonator is a good way to accomplish this. The digital resonator meets the criteria of small memory requirement and simplicity. Some problems were encountered during the real-time implementation, they depended on the fact that the tone searched for is 50 Hz and the sampling frequency is 19 kHz. This relationship gives small filter coefficients which leads to large quantization errors. These errors were reduced by the use of a wider resonator, giving a good result. A possible improvement could be to downsample the signal prior to the tone detection. The second step of the tone detection, i.e. the calculation of the power of the signal after the digital resonator and comparing this value to a threshold was implemented in Matlab..

(54) Bibliography [1] Allm¨ant om Taltidningsn¨ amnden. http://www.presstodsnamnden.se/ttn.html (2003-01-18). [2] Oppenheim, A.V., Willsky, A.S. and Nawab, S.H. (1997) Signals & Systems (2nd ed.). Upper Saddle River, New Jersey: Prentice-Hall, ISBN 0-13-6511759. [3] Shi, Y.Q. and Sun, H. (2000) Image and video compression for multimedia engineering : fundamentals, algorithms, and standards. Boca Raton, Florida: CRC Press, ISBN 0-8493-3491-8. [4] Spanias, A. S. (1994) Speech Coding: A Tutorial Review. Proceedings of the IEEE. vol 82, no 10, pp 1539-1582. ISSN 0018-9219. [5] Goldberg, R.and Riek, L. (2000) A Practical Handbook of Speech Coders. Boca Raton, Florida: CRC Press, ISBN 0-8493-8525-3. [6] Jayant, N.S. and Noll, P. (1984) Digital Coding of waveforms (3rd ed.). Englewood Cliffs, New Jersey: Prentice-Hall, ISBN 0-13-211913-7. [7] Gibson, J.D. (Ed.) (1997) The Communications Handbook. Boca Raton, Florida: CRC Press, ISBN 0-8493-8349-8. [8] Gibson, J.D. (1980) Adaptive Prediction in Speech Differential Encoding Systems. Proceedings of the IEEE. vol 68, no 4, pp 488-525. ISSN 0018-9219. [9] Zelinski, R. and Noll, P. (1977) Adaptive Transform Coding of Speech Signals. IEEE transactions on Acoustics, Speech and Signal Processing. vol ASSP-25, number 4, pp 299-309. ISSN 0096-3518. [10] ITU-T Patents Database. http://www.itu.int/ITU-T/dbase/patent/ (200301-19). 43.

(55) 44. Bibliography. [11] Benvenuto, N., Bertocci, G., Daumer, W.R. and Sparrell, D.K. (1986) The 32-kb/s ADPCM coding standard. AT&T Technical Journal. vol 65, no 5, pp 12-22. ISSN 8756-2324. [12] ITU-T Recommendation G.726. (1990) 40, 32, 24, 16 kbit/s adaptive differential pulse code modulation (ADPCM). Geneva: ITU. [13] Analog Devices homepage. http://www.analog.com (2003-01-21). [14] Couch, L.W. (2001) Digital and Analog Communication Systems (6th ed.). Upper Saddle River, New Jersey: Prentice-Hall, ISBN 0-13-081223-4. [15] Chen, C. (2001) Digital Signal Processing. New York: Oxford University Press, ISBN 0-19-513638-1. [16] Analog Devices (1995) ADSP-2100 Family User’s Manual. Norwood, MA: Analog Devices, Inc. (Available in pdf format from http://www.analog.com)..

(56)

References

Related documents

If you release the stone, the Earth pulls it downward (does work on it); gravitational potential energy is transformed into kinetic en- ergy.. When the stone strikes the ground,

I have chosen to quote Marshall and Rossman (2011, p.69) when describing the purpose of this thesis, which is “to explain the patterns related to the phenomenon in question” and “to

Andrea de Bejczy*, MD, Elin Löf*, PhD, Lisa Walther, MD, Joar Guterstam, MD, Anders Hammarberg, PhD, Gulber Asanovska, MD, Johan Franck, prof., Anders Isaksson, associate prof.,

Thanks to the pose estimate in the layout map, the robot can find accurate associations between corners and walls of the layout and sensor maps: the number of incorrect associations

Besides this we present critical reviews of doctoral works in the arts from the University College of Film, Radio, Television and Theatre (Dramatiska Institutet) in

Genom att överföra de visuella flöden av bilder och information som vi dagligen konsumerar via våra skärmar till något fysiskt och materiellt ville jag belysa kopplingen mellan det

I think the reason for that is that I’m not really writing different characters, I’m only showing different fragments of myself and the things that have stuck to me and become

This is valid for identication of discrete-time models as well as continuous-time models. The usual assumptions on the input signal are i) it is band-limited, ii) it is