Development of a microcontroller based DLCT end-point device

(1)

17 010 oktober

Examensarbete 15 hp

Oktober 2017

Development of a microcontroller

based DLCT end-point device

Adam Myrén

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Development of a microcontroller based DLCT

end-point device

Adam Myrén, Simon von Schmalensee

In this bachelor thesis the possibility of implementing a fully functioning Digital Signal Processing system based on the ARM-Cortex-m7 microcontroller from ST- microelectronics is investigated and implemented. The microcontroller is equipped with an Floating Point Unit, which resulted in the filter calculations being performed in floating-point arithmetic instead of fixed-point. The system is intended to be used in audio room correction with filter coefficients calculated by DLCT (Dirac Live Calibration Tool) which is a software distributed by the company Dirac Research. The main system components are a run-time where the audio is processed and a TCP/IP server for communication over ethernet between the system and DLCT. The system is also able to play stimuli sounds on a command from DLCT.

The final system is capable of executing the filter calculations required for room correction with the filter topology used. The communication between DLCT and the subsystem was not fully established but the TCP/IP server was implemented and is a good foundation if the project is to be resumed in the future.

The work showed that a modern microcontroller is able to perform real-time audio signal processing without the use of a digital signal processor which is more

expensive and has a higher development cost.

(3)

Acknowledgements

(4)

2.2 Sampling . . . 7 2.3 Reconstructing/ Interpolation . . . 8 2.4 Digital Filters . . . 8 2.4.1 FIR filter . . . 8 2.4.2 IIR filter . . . 11 2.5 Microcontroller . . . 13 2.6 Communication protocols . . . 13 2.6.1 UART . . . 13 2.6.2 I2_{S . . . .} ₁₄ 2.6.3 TCP/IP . . . 14 2.7 DMA . . . 16 2.8 Room correction . . . 16

2.8.1 System requirements for room correction . . . 18

2.9 DLCT . . . 19

3 Software structure 20 3.1 Sections . . . 20

3.2 Audio loop section . . . 22

3.2.1 Collection of input data . . . 22

3.2.2 Signal Processing . . . 23

3.2.3 Outputting processed samples . . . 24

3.3 Stimuli playback . . . 24

3.4 Ethernet Communication . . . 24

3.5 GUI . . . 26

4 Results and Discussion 26 4.1 Signal Processing . . . 26

4.2 Software structure . . . 27

4.3 Storage . . . 27

4.4 Communication . . . 27

(5)

Abbreviations & Explanations

ADC Analog-to-Digital Converter API Application Programming Interface AVR Audio/Video Receiver

BSP Board Supply Package

CMSIS Cortex Microcontroller Software Interface Standard CODEC Coder/Decoder

DAC Digital-to-Analog Converter DLCT Dirac Live Calibration Tool DSP Digital Signal Processing

FATFS File Allocation Table File System FFT Fast Fourier Transform

FIR Finite Impulse Response FPU Floating Point Unit GUI Graphical User Interface I2_{S Inter-IC sound}

IFFT Inverse Fast Fourier Transform IIR Infinite Impulse Response

ISR Interrupt service routine LTI Linear Time Invariant

LWIP Lightweight Internet Protocol MAC Media Access Control

MCU Microcontroller Unit PHY Physical layer

RPC Remote Procedure Call

(6)

1 Introduction

1.1 Background

Since the sound systems entry into the consumer market, technologies have been used to optimize the sound with respect to the acoustics of the environment. Earlier this was only possible by constructing different types of analog filters specified for a particular position in a room. With the development of the modern computer and the ability to digitize sound and process it digitally, new methods based on digital signal processing have been developed. This has meant that room correction can be done by using software that creates tailored digital filters. Thus, the same software can be used for different speakers and different environments. Dedicated signal processors (DSPs) have usually been used for this signal processing. Because systems based on DSPs also usually require a MCU for communication and interface between external devices, it would be desirable to also perform the signal processing only with the MCU. At present, there are several MCUs suitable for signal processing, which makes this possible. Another reason for this is that the development costs for MCU applications are lower than the equivalent of DSP.

Dirac Research is a company which develops algorithms and software for audio optimization of speaker systems. One of their products is a room correction software that is used to minimize the speaker and room coloring of the sound. The frequency response for the room and speakers is measured at different locations and then combined to create filters that compensate for this. At the moment, the music must be played through the same computer on which the program is installed. To reduce the system’s limit, it is desirable to be able to play music from other devices and easily switch between them. By implementing the filter on a stand-alone platform and using the computer only when measuring the room acoustics and calculating filter coefficients this can be achieved.

1.2 Purpose

The main purpose of this bachelor thesis is to investigate the possibility of implementing a DSP-system, powerful enough to be used for high-end real-time room correction on a small embedded platform based on an ARM Cortex-M7 microcontroller with a limited amount of processing power and memory capacity.

1.3 Goal

(7)

2 Theory

2.1 Binary representation of numbers

Digital electronics uses a base-2 number system called binary. The number is represented as a sequence of 1’s and 0’s where each individual element is called a bit. The sequence by it self does not represent a unique number, instead it depends on how the computer interprets it. There are two common ways to represent decimal numbers in a binary format called floating point and fixed point. A floating point number consist of three parts, the sign, the exponent and the mantissa. The standard definition of a 32-bit floating point number is given by IEEE-754. A floating point number can be written as

(≠1)s_{ú 2}e≠127_ú3_{1 +} m

223

4

(1) Where s represents the sign bit, e represents the exponent and m represents the mantissa[1].

Figure 1: structure of 32-bit floating point representation

Figure 1 shows how a floating point number is structured in a computer. This way of representing numbers makes it possible to have a wide dynamic range. With fixed point representation one uses a integer which is scaled by a specific factor. The same scaling factor can not be changed under computation, which means unlike floating point representation that a specific fixed point data type always have the same number of digits after the radix point. A disadvantage with fixed point representation is that all of the parts in a computer program which process a type of fixed point numbers must keep track of where in the binary number the radix point is located.

2.1.1 Fixed point vs floating point in DSP-applications

(8)

2.2 Sampling

To be able to represent an analog signal in a computer, the signal must go through a process called sampling. The process is done by taking measurements of a signal at distinct time points as shown in figure 2 . It is common that the time interval is uniform throughout the process. If the signal can be reconstructed from the samples without any errors, the process is considered to be well performed. An important theorem regarding sampling is the Nyquist theorem. The theorem distinguishes if the sampled signal can be reconstructed without errors or not. It states that for a signal x(t) where

F (x(t)) = X(Ê) = 0 for some |Ê| > Êm (2)

x(t) is reconstructible from its samples x[nT ] if the sampling frequency ws is > 2wm. To

be able to reconstruct a signal without errors a sampling frequency two times the largest frequency component occurring in the sampled signal must be used. This frequency is usually referred to as the Nyquist frequency. If a signal is sampled with sampling frequency lower than the Nyquist frequency, a phenomenon called aliasing will occur which makes perfect reconstruction impossible[3].

Figure 2: Figure showing the analog signal and its digital representation with a sampling period of St

(9)

2.3 Reconstructing/ Interpolation

A continuous signal x(t) bandlimited to a frequency F which has been sampled with a sam-pling frequency of 2F can, according to the Nyquist theorem, theoretically be reconstructed without error by passing the signal through an ideal reconstruction filter which in the fre-quency domain has the following form[3]

H(2ﬁf) = Y ] [ As if f <|F| 0 else (3)

This theoretical method of ideal reconstruction can not be implemented in a real system due to the fact that the filter has a infinite impulse response and is not casual. An easy and common way to implement an interpolation method in a real system is to use the ZOH-method.[3] An output signal x(t) from a interpolator using the ZOH-model is given by

x(t) = Œ ÿ n=≠Œ x[n] · rect Q at≠ T 2 ≠ nT T R b (4)

Equation 4 shows that the output signal is obtained by convolving each sample with a casual rectangular pulse.

2.4 Digital Filters

LTI(Linear Time Invariant) filters are often characterized by the magnitude and phase re-sponse in frequency domain and the impulse rere-sponse in time domain. From the filters transfer function H(z) all this information can be derived. A filter magnitude and phase response of the transfer function, input and output can be expressed as

|Y [z]| = H[z]X[z] (5)

and

\Y [z] = \H[z]\X[z] (6)

A discrete filter impulse response can be obtained by taking the inverse z-transform of the transfer function. The two main types of digital filters are FIR(Finite Impulse Response) and IIR(Infinite Impulse Response) filters. These are described in the following two sections.

2.4.1 FIR filter

The main advantages of FIR filters are [4] 1. They are always stable

(10)

These advantages come with the disadvantage that they require a higher order to accomplish a greater roll off, compared to IIR filters. A block diagram of a FIR filter structure can be seen in figure 3 where the z≠1 blocks represents a delay by one sample and the b’s is the

individual samples of the impulse response, or filter coefficiants. To compute an output sample from the filter the input samples are shifted in from the left, multiplied by the respective impulse response sample and added to the output. In this example the output would be calculated as

y[n] = x[n]h[0] + x[n ≠ 1]h[1] + x[n ≠ 2]h[2] ... + ... x[0]h[n] (7)

where h[0] = b0 and so on. This calculation is the discrete convolution of the input signal

x[n] with the impulse response h[n] and convolution is the given method used for calculating

outputs of FIR filters. The discrete linear convolution sum can be described mathematically by y[n] = x[n] ú h[n] = n ÿ k=0 x[k]h[n ≠ k] (8)

where n is the number of taps in the impulse response. k starts at zero because of the filter being causal.

Figure 3: Block diagram of a FIR filter

Since FIR filters have a finite impulse response and the output only depends on a finite number of previous inputs, they are always stable. This can be shown with the BIBO (bounded-input, bounded-output) stability condition stating that a system is BIBO stable if the impulse response h[n] of the system satisfies

Œ

ÿ

n=0

|h[n]| Æ Œ (9)

(11)

Since FIR filters are frequently used but do not have a great computational efficiency, dif-ferent convolution methods exist. The convolution theorem states that convolution in one domain corresponds to point wise multiplication in the other domain. Linear convolution in the time domain can be achieved either with the straightforward linear convolution sum or with FFT(Fast Fourier Transform) convolution. The linear convolution sum produces a linear convolution with the signal and the filter kernel, which is desired when convolving real-time or very long signals where the longer signal needs to be divided into smaller blocks, and are not periodic.

The convolution sum has a quadratic computational complexity of approximately O(n2₎

and takes a lot of computational power when convolving large filters. With FFT convolution this computational complexity can be reduced to O(n log(n)).[5] Because of this improve-ment the FFT convolution is often used when convolving large filter kernels and dealing with long signals.

(12)

Figure 4: Graphical interpretation of the overlap-add method [6] The steps involved in the overlap-add method are

1. Take L samples of the incoming signal x[n] and zero pad to achieve a signal length of N samples. Zero pad the filter kernel h[n] to the same length.

2. Do a N point FFT of the zero-padded samples to produce the frequency domain X[n] and H[n]

3. Do a point wise multiplication of X[n] and H[n] to get the frequency domain output

Y[n] = X[n]H[n].

4. Take the IFFT of Y [n] and get the raw output yú[n]

5. Add the last M ≠ 1 samples from the previous output and add it to the first samples of yú[n] to get y[n]. Save the last M ≠ 1 samples from the current output block to be

added to the next output block.

For computing efficiency the FFT of the zero-padded filter kernel is often computed once in the beginning of a program or application and saved for all filter operations.

2.4.2 IIR filter

(13)

and operations.[7] IIR filters are more sensitive to quantization and round off errors. The transfer function of a second order IIR filter can be expressed as:

H(z) = b0+ b1z≠1+ b2z≠2

1 + a1z≠1+ a2z≠2 (10)

Where the a’s and b’s denote the filters coefficients.

When implementing an IIR filter for use with floating point arithmetic, the transposed direct form II structure is suitable since floating point arithmetic is not sensitive to overflow and the structure saves two memory locations in the state variables, compared to direct form I. [8] This structure is not suitable when fixed point arithmetic is used because of the wide dynamic range needed in the feedback state variables. Figure 5 shows a block diagram of the transposed direct form II structure. The input-output equation for this structure can be expressed as

y[n] = b0x[n] + d1

d1 = b1x[n] + a1y[n] + d2

d2 = b2x[n] + a2y[n]

(11) Where x and y denote input and output respectively, the a’s and b’s are the filter coefficients and d1 and d2 is the state variables saved for the next output sample to be computed.

(14)

2.5 Microcontroller

The microcontroller used in this project was selected according to the peripheral devices re-quired and that the CPU would be able to handle many calculations quickly. The peripherals that were initially required were I2_S_{communication, Ethernet MAC(Media Access Control),}

and any communication protocol to interface with an SD card. The microcontroller would also have to be equipped with an FPU to run all calculations in floating point arithmetic. One important factor was that the microcontroller would be on a development board that had audio codec, audio input and output, ethernet PHY(Physical Layer) with RJ45 connec-tor and SD card reader. The board would also have a programmer/debugger mounted on it to ease the development. The development board chosen was the 32F746NGDISCOVERY development board from ST-microelectronics. The microcontroller on this board is the STM32f746ng which has an ARM cortex-m7 CPU core with FPU. The card is equipped with all the required devices but also other devices used during the project. These are an external SDRAM and a capacitive touch display.

2.6 Communication protocols

2.6.1 UART

UART(Universal Asynchronous Receiver/Transmitter) is a basic asynchronous serial pro-tocol that preforms full-duplex communication with three signal lines. A UART device is equipped with pins for receiving packets (RX), sending packets (TX) and ground. Being an asynchronous protocol means that there are no common clock signal required. Instead, the user must set up individual clocks for both of the units. The units must be set up so that the internal clocks have the same data-transmission frequency. It is therefore important for having flawless UART communication that the clocks are accurate relative to the transmis-sion frequency and stable over time and temperature. If this is not the case there is a risk that the data that is being sent will be misinterpreted or completely missed.

(15)

shown in figure 6. It is most common to send the data bits with the LSB first. The parity bit, which is optional, can be used by the receiver for error checking.

2.6.2 I2_S

I2S(Inter-IC Sound) is a communication protocol developed by Phillips semiconductors. The

protocol is a serial interface and is used to transmit stereo PCM audio data. The bus consist of at least three lines which are shown in figure 7. These are

• Word select (WS) • Serial data (SD)

• Continuous serial clock (SCK)

The word select line decides which channel being transfered, the SD transfers the audio as serial data, and SCK is a common clock shared between the two units which are communi-cating. I2_S _{is a master/slave-protocol where the master provides the shared clock.}

Figure 7: A setup of two units communicating with I2_S _{where the transmitter acts as master.}

The serial data is transfered in twos complement with the most significant bit first. This gives flexibility to the transfer due to the fact that the transmitter and receiver may have different word lengths. If the receiver receives more bits than its word length, the bits after the least significant will be ignored. On the contrary, if the receiver receives fewer bits, the remaining bits will be set to zero.[9] Which clock frequency that should be used is decided by the following equation

fc = fs· S · 2 (12)

Where fc is the clock frequency, fs is the systems sampling frequency and S is sample size

in bits.

2.6.3 TCP/IP

(16)

uses five layers of protocols. Together these protocols are often referred to as a protocol stack. Figure 8 shows the topology of the TCP/IP-stack. On the sending transmitting side the stack is successively encapsulating the payload data with headers and trailers regarding formation, package order and addresses. The task for the stack on the receiving side is then to unwrap the received message and in each layer exam, use and strip of the header information related to that specific layer.

1. Physical Layer: This layer contains the necessary functions needed to be able to send and receive a bit stream over a physical medium.

2. Data Link Layer: Takes care of the decoding, encoding and organization of the bit stream from and into frames.

3. Network Layer: Creates or disassembles the packets which are moved around the network. It uses IP addresses to associate the packets with a source and destination. 4. Transport Layer: Establishes a connection between applications on different hosts 5. Application Layer: Where the actual payload data is generated.

Figure 8: Flow chart showing the flow of a message through the TCP/IP-stacks of transmitter and receiver

(17)

2.7 DMA

DMA or Direct memory access is a technique which lets peripherals have direct access to the main system memory without interfering with the CPU. This means that the CPU can perform other tasks while data is being transfered. The hardware which makes this possible is called a DMA-controller. Figure 9 shows a basic layout for a DMA-system. For a typical DMA transfer the DMA-controller gets notified by a peripheral that it wants to read/write from/to memory, the DMA-controller then asserts a request signal to the CPU asking for permission to take control over the data bus. The CPU then stops driving the bus and returns a acknowledge signal to the DMA controller. Now the DMA controller drives the data and memory bus as if it was the CPU. When the transfer between the peripheral and memory is done, the DMA sends a signal to the CPU that the transfer is finished. The DMA-controller itself never processes any of the data being transfered. It only directs the data to the requested address.

Figure 9: DMA structure

2.8 Room correction

To optimize the performance of a loudspeaker in a arbitrary room one must take in to consideration that the sound coming directly from the loudspeaker will interfere with the sound that gets reflected from the surrounding surfaces. This will alter the sound and the effect is referred to as unwanted convolution. To minimize the effect of this phenomena, a process of deconvolution is performed[13]. The main goal of the deconvolution is to recreate the signal that existed before being altered by the room acoustics. In its most basic form, the deconvolution process can be understood by examining the LTI-system in figure 10 where

x(t) is the system input signal and z(t) is the output signal. H(Ê) is the system transfer

function, which is responsible for the unwanted convolution and F (Ê) is the deconvolution filter also referred to as inverse filter. A perfect deconvolution filter satisfies the following equations

!

(18)

Figure 10: LTI-system with a deconvolution/inverse filter

If assuming that both the loudspeaker and listener will remain in the same exact location, the room can be regarded as a linear and time invariant system which is characterized by an impulse response h(t). Many acoustical parameters can be derived from the impulse response, therefore there is of great importance that the measurement made to get hold of the impulse response is done with high accuracy. The most common approach in preforming the impulse response measurement is to apply a known input signal and measure the output of the system. There exist many different choices when it comes to which input signal and deconvolution technique to use. The application described in this report uses the methods called Exponential Sine Sweep and Pink noise. The pink noise method uses an input signal which have a power spectral density that is inversely proportional to the signals frequency This section will describe the Sine sweep method very briefly. A more in depth description can be found in [11]. The input signal used in this technique is based on the following equation x(t) = sin Q a Ê1T ln(Ê1 Ê2) (et Tln(Ê2_Ê1)≠ 1) R b (14)

Ê1 = 2ﬁf0 and Ê2 = 2ﬁfs where f0 and fs is the start respectively stop frequency of the

(19)

Figure 11: Representation of the logarithmic sweep used in the application. The graph is zoomed to better show how the frequency increases. The actual frequency interval is 10-24kHz

There are different approaches regarding the design of the inverse filter f(t). The most trivial approach to obtain a filter is to first reverse and then delay the logarithmic sweep. The inverse signal must also be scaled to obtain a linear frequency response. The delay is necessary to make the inverse filter causal. After constructing the inverse filter this can be used to obtain the impulse response of the room which is then used to design a room correction filter[12]. As mentioned, this is a trivial approach. DLCT uses a more sophisticated method which is described in [16]. To be able to do impulse response correction a non-causal part of the filter is needed. For computational efficiency, an IIR filter would be ideal. The problem is that IIR filters has an infinite impulse response, and an infinite amount of samples would have to be delayed to create the non-causal part with IIR. An IIR filter can however model the causal part of the filter with a combination of a FIR filter that models the non-causal part.[14] This makes the filter more computational efficient than if only a FIR filter was to be used.

2.8.1 System requirements for room correction

(20)

Figure 12: Table showing filter requirements per channel at a sample rate of 44.1kHz

2.9 DLCT

The Dirac Live Calibration Tool is a software intended for use in calculating room correction filters. These filters are then applied to audio through Dirac Audio Processor or a standalone device such as an AVR(Audio/Video Receiver). The measurement process involved when calculating a filter for a standalone device is divided into six pages that need to be interacted with.

1. The first page contains information about the sound system supported by the device 2. The second page is the mic configuration page. Here the user selects a calibrated

microphone to be used during the measurement process.

3. On the third page the input and output levels can be tested to check if the volume is sufficient or not too loud.

4. On the fourth page the measurement process takes place. A stimuli is played at one speaker at a time and DLCT records the output on which the deconvolution then takes place.

5. At the fifth page the filter design takes place. Here the user can change how the resulting impulse response or magnitude response will look like.

6. At the sixth and last page the filter can be downloaded to the standalone device on which the filter will be applied to the audio played.

(21)

3 Software structure

3.1 Sections

All parts of the software was written in C and compiled with the SW4STM32 toolchain which has a GCC-based compiler. In all of the initializations of the peripherals the HAL library was used. The HAL is a hardware abstraction layer written by ST to simplify the development and portability of their devices. With the HAL, there is no need for the devel-oper of an application to deal with the hardware registers of each peripheral, instead each peripheral is contained in different software structures where the configuration takes place. The initialization function then takes these structures as inputs and configures the periph-erals as described in the structures.

The application is developed without any operating system. The software is structured in a branchlike manor with four main sections. The four different functionalities for these sections are.

1. Handling the input, output and signal processing of audio. 2. Handling the playback of the stimuli.

3. Communicate with the user and external software via the ethernet interface. 4. Check for GUI(Graphical User Interface)-input and update.

(22)

Figure 13: Flowchart of the overall structure of the main function

Which of the functions responsible for the audio loop back and the playback of stimuli sound that will fully execute depends on input received from the Ethernet interface. Figure 14 shows the code structure for these three functions.

(23)

3.2 Audio loop section

The part of the software which takes care of the audio loop and the signal processing is mainly developed around two libraries. These two are the STM32F7-BSP(Board Supply Package) which is a library containing a set of API’s which are related to the external hardware components such as the audio codec. The other one is the CMSIS-DSP(Cortex Microcontroller Software Interface Standard) library which handles the computational work in the audio loop, such as conversion between data types and the signal processing. The code is a further development of an example provided by ST. The example showed how to record audio from the two microphones located on the board and output it through the 3.5mm AUX output jack. The main modification of the example was to change the code so that the recorded audio was to be taken from the 3.5mm AUX input instead of the microphones. This change introduced the possibility to output audio from an external device and pass it trough the STM32F7 Discovery board.

3.2.1 Collection of input data

Collecting and storing samples is vital to be able to do any signal processing. The first thing that has do be done is to convert the incoming sound from analog to digital. This is done by the W8994 audio codecs internal ADC(Analog-To-Digital Converter) which operates a sample rate of 48kHz. The codec and MCU transfers the audio data over I2_S_{. The I}2_S

peripheral then stores the samples in a buffer located in the SDRAM. The transfer between the I2_S _{and the memory is done with DMA. When one half of the buffer is filled, the}

DMA-controller alerts the MCU by calling an ISR which collects the new samples which are ready to be processed. The full audio loop process is shown in figure 15. This technique of having the codec write to one part of the buffer while the other part of the buffer is send to the MCU for processing is refereed to as double buffering. This is a method to help prevent overwriting samples that haven’t yet been sent to the MCU for processing.

(24)

3.2.2 Signal Processing

The signal processing is operating in blocks, which means it processes a block of data at a time. This method of using block processing reduces the overhead function calls to the DSP-functions. Some algorithms like the FFT are by default implemented in such a way that block processing is the only option. The flow of the function is as follows

1. Convert the processed samples from the previous iteration from floats to int16(Signed 16 bit integer) and send the buffer to SDRAM

2. Collect the new samples from the SDRAM and convert them from int16 to floats 3. Split the samples into two separate buffers, one for each channel.

4. Do the filtering on each of the channels.

5. Interleave the samples from each channel back into a single buffer.

A block diagram of the process is shown in figure 16. The conversion from int16 to float and vice versa is done because of the fact that the samples collected from the SDRAM are 16-bit integers whereas the DSP-algorithms are written to work with floats. Arguments why the code is designed to work with floats is based on the statements in the theory section about floating point vs fixed point arithmetic. The conversion is made with functions from the CMSIS library. The channel splitting is necessary to be able to apply filters with different filter coefficients on the two channels. The filter function heavily depends on the CMSIS-functions for FFT/IFFT, complex multiplication and IIR filtering. As earlier mentioned the FFT/IFFT-functions operates on blocks of data. A section for performing overlap add is implemented in the filter function. Why this is important is covered in the theory section about FIR-filters.

(25)

3.2.3 Outputting processed samples

The process of playing the processed samples from the 3.5mm output follows a similar structure as for collecting an incoming signal. The processed samples gets sent from the SDRAM to the I2_S_{peripheral with DMA and then to the codec. The codec uses its internal}

DAC (Digital-to-Analog Converter), which is running at a 48kHz sample rate, for converting the samples to a continuous analog signal. The codec then routes the analog signal out through the 3.5mm AUX output.

3.3 Stimuli playback

The section which is in charge of outputting the stimuli is structured in a similar way as the audio loop, but instead of having the input signal coming from an external audio source the input data is fetched from a µSD-card. The samples which represent a frequency sweep and pink noise are stored as individual WAV-files on the µSD-card. To have the MCU interfacing with a µSD-card a file system is required. The file system used is FATFS(File Allocation Table File System) which is an open source FAT/exFAT file system module targeted specific to embedded platforms. The system is platform independent and very well documented which made it a good choice for this application. The frequency sweep was generated from a code provided by Dirac Research and the pink noise was generated in MATLAB.

The WAV-file format is a good choice in this application due to the fact that it can store uncompressed PCM-data. Because of this, no extra decoding has to be done. The WAV-file stores the bytes in little endian format, which means that the data is stored with the least significant byte first. The application reads data in big endian, so a conversion between the two formats is made in the software.

Similar to the Audio loop, the stimuli playback is based around two functions, one which initializes the codec and I2_S _{and opens the desired file, and one for updating the output}

buffer which the codec gets its data from through the I2_S _{peripheral. In the same manner as}

with the audio loop, the initialization function gets called in the main loop while the function responsible for updating the output buffer gets called in an ISR(Interrupt Service Routine). The ISR is triggered when new samples must be sent to the SDRAM. The playback function is handling the buffer with the same technique as the audio loop, i.e double buffering.

3.4 Ethernet Communication

(26)

stack was chosen because the target of it in embedded systems, the wide use and the rich documentation of it. The application was built from an application note written by ST targeting the stm32f4 microcontroller with another PHY chip that used MII communication instead of RMII communication. The main difference between the MII and RMII protocols is the number of data lines used. The internal registers of the PHY’s is very similar and the only thing that distinguishes between them is vendor-specific registers which was not needed. This made the use of the application note suitable.

Figure 17: Flow of the lwIP implementation running in stand-alone mode.

(27)

3.5 GUI

The GUI was from the beginning not a part of the project but since the development board used had a touch display a simple GUI was implemented. The touch LCD screen was controlled with the MCU from functions in a BSP packet from the STM32CubeF7 software. The GUI consists of a volume slide bar and a play/pause button which mutes and unmutes the audio when touched. The volume slide bar changes the bar level and volume level when the display is feeling a touch in the area of the bar. Since the play/pause button toggles state each time touched, a simple state machine was implemented to keep track of the button state. Figure 18 shows a picture of the GUI.

Figure 18: A picture of the simple GUI with the button in the playing state

4 Results and Discussion

4.1 Signal Processing

(28)

frequency domain. The method that this application uses is the overlap-add method which is presented in theory section regarding FIR-filters. Another method to handle the circular convolution called overlap-save could also be used, but the implementation of the overlap-add was more well fitting with the rest of the code responsible for the signal processing.

Figure 19: Graphs showing the difference in execution times between two 899 taps 48kHz FIR-filters.

4.2 Software structure

The resulting overall software structure runs well. There were discussions regarding making the software more interrupt driven instead of the polling in the main functions. This is due to the fact that polling is generally seen as an inefficient method. But it seems that the inefficiency is not a problem in this application. In the early time of the project there were some ideas about using a real time operating system. This would make it possible to utilize functions like threads and scheduling. It would also simplify the process of implementing the TCP/IP-stack which took a great deal of time.

4.3 Storage

The system is able to store and read data from a µSD-card by using the FatFs file system. The system is also capable of playing WAV-files stored on a µSD-card. This feature is used to play the stimuli signals which are intended to be used as excitation signal under the deconvolution process for obtaining the impulse response of a room. The µSD-card can also be used to store filter coefficients.

4.4 Communication

(29)

be implemented in C++. This would require some type of wrapping of the functions to make it possible to call them from the code written in C. Another possibility is to make use of the gRPC core code which is written in C, this approach requires a solid understanding about gRPC which was out of the scope of this thesis. Also a lot of the functionality would be lost by using the C core. No investigation regarding the memory requirements for implementing a fully functional gRPC system was done. So no conclusion regarding the possibility of communicating with the DLCT with gRPC could be drawn. There exists ideas regarding how to overcome the need of porting a gRPC interface to the application and instead use the implemented interface in combination with network sockets. This would require some type of middleware on the remote client side. The middleware would be responsible for handling the parsing of messages being send between the application and DLCT.

5 Conclusion

The MCU based audio system is able to process two channels of audio data at a sample rate of 48kHz, which eliminates the need of a dedicated digital signal processor in this applica-tion. Since the filter implementation was not tested with filter coefficients calculated with DLCT, no conclusions can be drawn whether the implementation is flawless or not, but a solid foundation to further work on the project exists. The ethernet communication with TCP/IP works as expected but more work has to be done if gRPC is to be used in the com-munication between the system and DLCT. Establishing the comcom-munication between DLCT and the application with gRPC could probably serve well as an independent thesis project. There are many different approaches on how to implement the overall software structure and if to use a real-time operating system or not, all these factors makes the project suitable for further work and investigation.

There were some troubles with the development board. One of the issues was that it was not possible to use the on-board debugger due to a silicon bug. Due to this problem, debugging was done by using the on-board LED, GPIO’s and UART. The development process could probably have gone faster if the debugger were to function properly.

(30)

References

[1] D Goldberg What Every Computer Scientist Should Know About Floting-Point

Arit-metic (Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto

March 1991)

[2] Gene Frantz, Ray Simar Comparing Fixed- and Floating-Point DSPs (Texas Instru-ments Incorporated Dallas, Texas 2004)

[3] FRED J. TAYLOR Digital Filters: Principles and Applications with MATLAB (John Wiley and Sons, Incorporated 09/2011)

[4] Sen M. Kuo, Bob H. Lee, Wenshun Tian Real-Time Digital Signal Processing :

Funda-mentals, Implementations and Applications [Page 102, 222] third. Ed. (Wiley, 2013)

[5] Julius O. Smith III Review of the Discrete Fourier Transform: FFT Convolution vs.

Direct Convolution (CCRMA, Department of Music, Stanford University)

[6] Dr.Deepa Kundur Overlap-Save and Overlap-Add (University of Toronto)

[7] Steven W. Smith The Scientist and Engineer’s Guide to Digital Signal Processing [Ch. 19] second ed. (California Technical Publishing , 1999)

[8] Nigel Redmon Biquads (http://www.earlevel.com/main/2003/02/28/biquads/, 2003) Information acquired 2017-07-15

[9] Philips Semiconductors I2S bus specification (February 1986)

[10] Forouzan, Behrouz A TCP/IP protocol suite (McGraw-Hill Forouzan networking series, 2010, 4. ed.)

[11] Angelo Farina Simultaneous measurement of impulse response and distortion with a

swept-sine technique (Dipartimento di Ingegneria Industriale, Universita di Parma,

February 1, 2000)

[12] TAN Guy-Bart, EMBRECHTS Jean-Jacques, ARCHAMBEAU Dominique Comparison

of different impulse response measurement techniques (Sound and Image Department,

University of Liege, Institut Montefiore B28, Sart Tilman, B-4000 LIEGE 1 BELGIUM December 2002)

[13] Malcolm J. Crocker Handbook of Acoustics(John Wiley and Sons Inc, 9 March 1998) [14] Mathias Johansson On Room Correction and Equalization of Sound Systems (Dirac

Research AB)

(31)

[16] Lars-Johan Brännmark and Anders Ahlén Spatially robust audio compensation based on

SIMO feedforward control (IEEE Transactions on Signal Processing, vol. 57, no. 5, May