Audio streaming on top of 802.11n in an IoT context

(1)

Audio streaming on top of 802.11n in

an IoT context

An implementation along with a literature study of wireless IoT standards

Ljudströmning över 802.11n i en IoT-kontext

En implementation samt en litteraturstudie kring trådlösa IoT-standarder

Johan Uttermalm

Faculty of Health, Science and Technology Computer science

15 hp

(2)

(3)

Abstract

The Internet of Things (IoT) is a concept that revolves around ordinary devices that are connected to the internet for extended control and ease of use. Altran, a

company dealing in high technology and innovation consultancy, predicts a large growth in business opportunities in the IoT area in the coming years, and therefore wants to invest in knowledge about the Internet of Things.

Altran wanted a report that described popular wireless IoT communication

technologies along with a proposal for a general IoT communication platform or base that could be used to implement many of these technologies. Additionally, an audio streaming application were to be implemented on the proposed platform to validate its credibility.

(4)

List of Figures

Figure 2.1. Example of a mesh topology. 14

Figure 2.2 A 3-bit quantized wave compared to a continuous wave 16

Figure 4.1 Raspberry Pi 2 Single board computer 31

Figure 4.2 Overview of network design 36

Figure 4.3 Overview of the Controller node 37

Figure 4.4 Overview of the Speaker node 38

Figure 4.5 Overview of the bonding protocol 39

Figure 4.6 Controller application module overview 40

Figure 4.7 Overview of the C_Bonding software module 41 Figure 4.8 Controller bonding finite state automaton 42

Figure 4.9 Controller C_software module 43

Figure 4.10 Controller stream control finite state automaton 44

Figure 4.11 Speaker application module overview 45

Figure 4.12 Overview of the S_Bonding software module 45

Figure 4.13 Speaker bonding finite state automaton 46

Figure 4.14 Speaker S_software module 47

Figure 4.15 Speaker stream control finite state automaton 48

Figure 5.1 Bonding speaker list implementation 53

Figure 5.2 streaming speaker list struct along with speaker_thread_relation struct 53

Figure 5.3 Control thread creation 54

Figure 5.4 Broadcast thread main listening loop 55

Figure 5.5 initialize_bonding function 55

Figure 5.6 Mutex lock preventing race conditions between Speaker and Controller nodes 57

Figure 5.7 Unicast thread mutex signalling 57

Figure 5.8 Speaker stream buffer element definition 58

(7)

Glossary

● IoT - Internet of Things.

● PHY - Physical layer. The lowest layer in the TCP/IP model.

● MAC - Medium access control. The second layer from the bottom in the TCP/IP model.

● NET - Network layer, the middle layer of the TCP/IP model.

● IEEE - Institute of Electrical and Electronics Engineers. A standardisation organisation dealing in many wireless standards.

● 802.11n - An IEEE standard for wireless local area networks.

● Z-Wave - A wireless communication protocol designed for IoT applications. ● ZigBee - Also a wireless communication protocol designed for IoT

applications.

● sub-GHz - Frequency bands below 1GHz, typically around 800-900MHz. ● Sensor network - A network consisting of multiple connected sensor nodes

relaying various kinds of data.

● G.9959 - An ITU standard for wireless communication.

● ITU - International Telecommunication Union, a UN agency developing wireless standards.

● AES - Advanced Encryption Standard, an encryption algorithm.

● Transceiver - A device capable of both transmitting and receiving wireless signals.

● GPIO - General purpose Input/Output, ports used for generic I/O on various hardware chips.

● Bluetooth HS - Bluetooth High Speed, version 3 in the Bluetooth protocol family

● Bluetooth LE - Bluetooth Low Energy, version 4 in the Bluetooth protocol family.

● Frequency hopping spread spectrum - A transmission technique in which the sender transmits a small portion of data and then hops to a different frequency and repeat the process.

(8)

● Scatternet - An ad hoc network consisting of two or more ad hoc Bluetooth networks.

● WPA2 - A security suite that commonly operates on 802.11 based networks. ● MIMO - Multiple Input Multiple Output. MIMO is a key technology to increasing

throughput of a protocol. It utilizes multiple antennas to send parallel streams of data.

● RTP - Real-time transfer protocol. A Multimedia streaming protocol based on UDP.

(9)

1. Introduction

Altran is a global engineering and innovation consulting firm originally founded in France. Altran deals mostly in areas related to innovation and high technology

consultancy. The Internet of Things[1] (IoT) concept means that ordinary devices are connected to the Internet for extended control and ease of use. IoT is one of the technology areas where Altran expects the largest growth in business opportunities in the next few years. Therefore, Altran wants to invest in knowledge in this area. For a student, knowledge in this area is advantageous for the same reason. Altrans forecasts estimates the number of connected devices by 2020 to between 30 and 70 billion.

1.1 Project Goals

Altran was interested in what kind of IoT communication technologies that existed, and in what situations these technologies could be applied effectively. The first task was therefore to find and document popular IoT communication technologies. The contents of this documentation process can be seen in Chapter 3.

Altran also wanted me to propose a base for a general IoT communication platform, which can be used to implement several different IoT communication techniques. The base was defined as a hardware platform and an operating system which can support the communications protocols presented in the literature study. (see Chapter 3). This means that the base must be able to communicate in a variety of different physical and MAC layer protocols, and drivers must exist to facilitate such

communications.

In order to validate the findings for one particular use case (audio streaming), I should implement a wireless multimedia streaming protocol of my own design with a focus on audio streaming on top of the proposed base. Since implementing a

(10)

These goals can be summarized as:

1. Investigate which state of the art communication techniques that exist in an IoT context.

2. Propose a hardware and software base for a general IoT communication platform, which can be used to implement several different communication techniques.

3. Implement an audio streaming application for wireless audio streaming on this platform.

1.2 Project result summary

During the project duration a report on the popular IoT communication technologies has been produced and handed over to Altran. The contents of this report can be seen in Chapter 3. A hardware and software base have been proposed for general IoT communications, which is described in Chapter 4. Additionally, an audio

streaming application for wireless networks based on IEEE 802.11n[38] has been built upon this base.

1.3 Disposition

In this Section I will present the general structure of this report and it’s contents. Chapter 2 - Background

Chapter 2 describes general IoT properties, wireless networking principles, audio sampling techniques, multimedia networking, and finally distributed system concepts.

Chapter 3 - Literature study

Chapter 3 is a presentation of the contents of the literature study. It describes 6 different wireless IoT technologies, their technical specifics, advantages and disadvantages, along with areas of application.

Chapter 4 - Project Design

(11)

Speaker nodes. An in-depth design discussion of the network, Controller, and Speaker nodes is then presented.

Chapter 5 - Project Implementation

The implementations Chapter start by providing an overview of the network,

Controller, and Speaker implementation without delving into too much details. This part deals with the general implementation choices made during the development of respective component. Detailed implementation choices are thereafter presented in the same manner as in Chapter 4.

Chapter 6 - Project results and Evaluation

In Chapter 6 I present the results from the project, that is the literature study on IoT communication technologies, and the streaming application. I will present which parts of the streaming application that is implemented or not, and some issues which further work should aim to correct.

Chapter 7 - Conclusions

(12)

2. Background

In this Chapter I first describe IoT, its characteristics, and potential use cases in Section 2.1. This Chapter will then touch on the subject of wireless network technologies and how these are applied in an IoT context in Section 2.2. I will present information about wireless- and multimedia networking in Section 2.3 and 2.4 in order to depict the problems with transmitting multimedia over wireless links and why such problems must be taken into account when designing a wireless streaming protocol. Finally a part on distributed system time concepts in Section 2.5 will clarify why clock synchronization is important for multimedia streams.

2.1 Internet of Things

The Internet of Things (IoT) concept means that ordinary devices are connected to the Internet for extended control and ease of use[1]. It is in essence a network of interconnected entities which communicate and exchange data to improve efficiency or extend the functionality of the connected entities. Smart homes and smart cities are example of IoT applications[1]. For example; In a smart home the temperature can be regulated autonomously by having the heating system communicate with temperature sensors placed throughout the building.

The “things” in IoT are somewhat loosely defined as any device with a network interface and some form of processing unit[9]. The “things” can be vehicles, devices, furniture, or even animals embedded with a computer chip[1].

(13)

2.2 Wireless IoT network protocols

IoT devices are mostly low power, low cost devices operating various sensors and some form of communications interface[2,3]. They are usually powered by batteries, but in some cases also power outlets. Subsequently, most IoT devices aim for low power consumption[3] and IoT communication interfaces therefore try to employ power conservative solutions in the form of low frequency and low bandwidth

wireless technologies. Examples of low power protocols are Z-Wave[4], Bluetooth[5], and ZigBee[6]. The majority of these technologies have a throughput in the range of 100Kb/s (Z-Wave, Zigbee)[4,6], but some, like Bluetooth, have speeds up to

3Mb/s[5].

The reason for this low throughput is that most IoT applications require little data to be transferred over the network[3]. Signal coverage and low power consumption are more important properties[3]. We know from Friis transmission equation (see

equation 1)

Equation 1. P is the power (W) of the receiver and transmitter antenna respectively, G is the gain of those antennas in dBi, R is the distance between them in meters, and Lambda is the wavelength of the signal, also in meters,

(14)

Figure 2.1. Example of a mesh topology

Additionally, many IoT networks are organized as mesh- or ad hoc networks[4,5,6, 8], networks that do not require any existing infrastructure to forward and route packets. This is different from regular wireless technology, which require a subset of the connected nodes to form a dedicated routing and forwarding infrastructure. In a mesh network all nodes actively participate in the distribution of data within the network[8]. Messages can be routed via intermediate nodes to reach destinations outside of the origin node’s coverage[11]. For example: figure 2.1 depicts an arbitrary mesh network in which messages can be routed from F to C via E and B.

A fundamental difference between wired and wireless networks is that wireless networks are prone to burst errors, i.e. spontaneous large amounts of interference over a short amount of time[12]. In a wired environment the amount of interference within the transmission medium is small, whilst in a wireless network the

(15)

protocols have stronger error checking techniques than wired MAC layer protocols[15].

Finally, wireless protocols can be partitioned into two different medium access control paradigms: Random Access and Fixed Assignment. In random access, anyone can attempt to transmit at any time, but must be prepared for collisions in the medium. Fixed assignment on the other hand assigns different parts of the network resource to different hosts. The network resources can be time slots, codes, or frequencies. In the case of a fixed assignment scheme using time slots, different users transmit at different times, thus ensuring no collisions.

2.3 Audio Encoding and Sampling

Audio can be encoded in several different ways using many different techniques. Some of these encoding techniques also compress the data (mp31_{), whilst some}

store the audio data uncompressed (wav2_{). Depending on which encoding and which}

compression techniques that are used, the resulting audio file will require different bit rates. The typical bit rate for compressed audio is in the range of 100-300 Kb/s[16]. The reason audio is compressed is that without compression the bit rate would be too high to be transmitted over the internet, and the file size will be enormous compared to the compressed size.

To clarify, let us look at how analog audio is digitalized. According to Nyquist, the analog audio signal need to be sampled at at least double the frequency of the highest frequency component[17]. Humans can hear frequencies up to about 20 KHz[18]. Thus, we need to sample at minimum 40 KHz to be able to perfectly reconstruct all frequencies the human ear can perceive. This is proven by the Nyquist theorem, or the Nyquist-Shannon sampling theorem as it is also known. Additionally, the sampled signal must be digitalized by a quantization function. A quantization function, as depicted in Figure 2.2, is a mathematical function mapping a large set of input values, in our case points on a sound wave, to a smaller set of output values. In our case this is binary representations of the levels of the sound wave at various points in time. This quantization process undoubtedly produces some noise within the signal, as the continuous elements of the signals are stripped

1_{MPEG-2 Audio Layer 3, or MP3 is an audio encoding format.}

(16)

away in the discrete version. With more bits allocated to the quantization function, the original signal can be more closely replicated. This is called the precision of the quantizer.

Figure 2.2. A 3-bit quantized wave compared to a continuous wave. By Hyacinth - Own work, CC BY-SA 3.0, $3

There is, however, a tradeoff in the quantization: the more bits allocated for precision the larger the file will be and the higher the bit rate needs to be.

This relationship can be shown using Equation 2. (See below)

Equation 2. B is the bitrate in bits per second, f is the sampling frequency in Hertz, q is the quantizer precision in bits, and c is the amount of channels. In the case of stereo c is 2. If we have a sampling frequency of 40 KHz, 24-bit sampling precision, and 2 channels (stereo), we need a minimum of 1.92 Mb/s in playout throughput to play this file without time lag. Compare this to an mp3 bit rate of roughly 200 Kb/s and you can see why effective compression techniques are important.

2.4 Multimedia networking requirements

(17)

user. The bit rate varies depending on encoding and compression. For audio the requirements are usually in the range of 100-200 Kb/s, and for video usually up to 1-2 Mb/s.

The basic principle in streaming is that the network must be able to deliver data at the same rate or faster than the receiver plays the data[19]. Otherwise, the receiver will experience lag or tearing due to its playout buffer, i.e. the buffer in which the receiving application store the multimedia playout data, empties faster than it can be refilled. Adaptive streaming technologies therefore aim to flood the buffer early on, giving the receiver a lot of data to play in the beginning, and thereafter adapt the sending rate after how full the buffer currently is. In short; if the buffer is filling,

reduce the send rate. If it is emptying, increase the send rate. A key observation here is that the network bandwidth must exceed the buffer playout speed for the adaptive streaming to work[20]. Recall that most IoT protocols are built for low power sensor networks and thus have low data throughput requirements. They are therefore ill suited for multimedia streaming.

Non-IoT protocols such as 802.11n Wi-Fi are deemed much more suitable for this area of applicability due to the higher throughput. However, we must still cope with the problem of burst errors which can seriously hamper the playout rate of a streamed multimedia file.

Burst errors are especially harmful to streaming protocols utilizing TCP. This has to do with the TCP flow and congestion control algorithms. Recall that TCP has 3 phases in its congestion control system; slow start, congestion avoidance, and fast retransmit. If a set amount of duplicate ACKs are received during congestion avoidance, fast retransmit is initialized and a portion of the sending window is retransmitted. If there is a timeout however, TCP goes back to the slow start phase and cuts its sending rate down considerably. In wired networks, timeouts are rare since these imply extreme congestion in the network. In a wireless environment on the other hand, nodes can quite frequently lose connection for relatively large

amounts of time and can therefore induce timeouts in the TCP congestion avoidance algorithm. This can have dire consequences on multimedia streaming, which

(18)

aim to solve this problem. It should be noted, however, that UDP, contrary to popular belief, is not a better alternative for multimedia streaming[21].

2.5 The distributed nature of an IoT system

Another important aspect of IoT is that the system usually consists of many connected computers spread out over an area. This per definition makes it a distributed system and thus introduces problems related to the consistency and synchronization of the system. More specifically, the system lack a global clock. Instead, various synchronization algorithms must be used to ensure that the clocks does not drift apart from each other. The process of clocks drifting apart is called clock skewing.

Synchronization of clocks is important since multimedia streaming protocols will often timestamp each packet to tell the receiving system in what order to play the contents[22]. Synchronization of clocks becomes particularly important in a

multimedia system with more than one receiver, as often is the case with a wireless speaker system. If the clocks on the different speakers skew, multimedia content can be received and played out at different times. This becomes a problem since the speakers will play different parts of the music at the same given instant of time.

2.6 Summary

(19)

3. Literature study

The aim of the literature study was to find and document popular IoT capable

wireless protocols and identify which protocols that are good candidates to build the base and audio streaming protocol around.

6 different protocols were investigated; 802.11n, 802.11ah[3], Bluetooth, Thread, ZigBee, and Z-wave. These protocols were chosen based on popularity and functionality. I present the findings in Sections 3.1 - 3.5.

Special attention was paid to the PHY (physical) and MAC (medium access control /link) layer characteristics of the protocols, their area of application, and support on the Raspberry Pi 2, which was chosen as the hardware base for the project (See Section 4.1.1).

3.1 Z-Wave

Z-wave was designed to be easily embedded in everyday electronics. It’s purpose is to produce smart electronics out of otherwise “dumb” technology. Below are some technical characteristics.

● Physical and MAC layers are based on the G.9959 standard[23]. ● Throughput is specified up to 100Kb/s.

● Z-Wave operates in sub-GHz unlicensed frequencies. 868.42MHz in Europe and 908.42 MHz in the United States.

● It has mesh routing capabilities of up to 4 hops.

● Node-to-node range is about 100m in optimal conditions but it depends on sending power and propagation environment.

● It is secured with AES encryption.

(20)

3.1.1 Advantages

Carrier frequency

Since Z-Wave operates in sub-GHz frequency bands it will not interfere with 802.11n or Bluetooth transmissions.

Signal range and Mesh routing

Due to operating in lower frequencies, it has a relatively long signal range of 100m. This can be further enhanced using its mesh routing capabilities.

Power requirements

Due to its low power requirements and FL mode it can operate on small batteries for a very long time. Exactly how long depends on sleep time and transmission power.

Throughput

A throughput of 100Kb/s is sufficient for low power, low throughput applications. 3.1.2 Disadvantages

Carrier frequency

Z-Wave operates in unlicensed sub-GHz bands, which differ in Europe and the United States. Z-Wave operates at 908.42MHz in the United States and 868.42MHz in Europe. Therefore, multiple transceivers must be installed if the device are to operate internationally. Subsequently, different transceivers must be installed for American and European devices.

The limitations of the mesh routing is also a concern. Z-Wave’s mesh routing is limited to 4 hops. Using mesh routing will give you coverage of roughly 400m at best. In most industrial applications you would want to route to the entire network however.

IP addressability

Z-wave is not IP addressable, so for applications where internet access is required, a gateway solution must be used, which adds complexity.

Power requirements

(21)

Throughput

Finally, a low throughput of roughly 100Kb/s makes this protocol unsuitable for anything but low power low throughput applications.

3.1.3 Is Z-Wave supported on the Raspberry Pi 2?

Yes, with the use of the RaZberry[24] daughter board connected to the GPIO pins of the Raspberry Pi 2.

3.1.4 Areas of application

Z-Wave can be used for household automation as it was designed to easily be embedded into everyday electronics. Due to the lack of IP addressability, gateway solutions must be used for home applications requiring internet connectivity. Additionally, the limited mesh routing capabilities makes Z-Wave ill suited for industrial applications.

3.2 Thread (6LoWPAN)

Thread is a new protocol based on the 6LoWPAN3_{standard[25], which in turn is}

based on the IEEE 802.15.44_{standard[26], but IPv6 addressable. It is built for home}

automation and control. Its features include mesh routing with no single point of failure, self healing which allows the network to route around crashed or

disconnected nodes, and low power consumption.

Thread is still in early development, which makes available documentation sparse. Membership in the Thread group, which is required for more detailed documentation, has a fee. Below is a summary of the data I was able to acquire.

● PHY, MAC, and NET(network) layers are based on 6LoWPAN which is based on IEEE 802.15.4.

● Throughput is roughly 250 Kb/s if operated at 2.4 GHz.

● Operates at 868 MHz in Europe, 2.4 GHz worldwide, and 915 MHz in the United States.

● Mesh routing and self healing, no single point of failure.

● Node range is unknown since information about that was unavailable. ● Secured with AES encryption.

3_{6LoWPAN is project to make IPv6 compatible with 802.15.4 based networks.}

4_{IEEE 802.15.4 is an IEEE standard for low power wireless personal area networks. It defines the}

(22)

3.2.1 Advantages

Carrier frequency

Since Thread operates in sub-GHz frequencies it will not interfere with the highly contested 2.4GHz unlicensed band. It can, however, be operated at 2.4GHz for an increase in throughput. Thread also builds upon the 802.15.4 standard, which is already widely deployed due to its use in other protocols.

If Thread is operated in sub-GHz frequencies it consumes less power or gain more transmission range, depending on what sending power you use. Thread has mesh routing capabilities and can self heal if intermediate nodes go down. There is no single point of failure within the network.

IP addressability

Thread is IP addressable, which means that it can communicate with the rest of the internet.

Power requirements

Since 6LoWPAN, which Thread is built upon, already has very good power saving mechanisms, the power consumption of Thread is also very low.

Throughput

Thread has a relatively high throughput of 250 Kb/s if using the 2.4 GHz band compared to other low power protocols.

3.2.2 Disadvantages

Carrier frequency

Due to operating in sub-GHz bands which differ in Europe and the United States, Thread will also require different transceivers based on where it is deployed. The ability to operate on the worldwide 2.4 GHz band somewhat mitigates this.

IP addressability

(23)

Throughput

The low throughput of 250 Kb/s makes Thread unsuitable for high throughput applications such as multimedia networking. For low throughput applications this throughput is sufficient.

3.2.3 Is Thread supported on the Raspberry Pi 2? Not yet.

Thread operates in sub-GHz frequencies and has good power conservation capabilities thanks to its building block 6LoWPAN. This means sensor networks of various kinds can utilize it. The mesh topology and self healing characteristics also makes it interesting for industrial automation purposes. Additionally, since it is IP addressable, it can be used for home automation applications where internet connectivity is required without requiring a gateway solution.

3.3 ZigBee

Zigbee was designed for low cost, low power systems. It is mesh routable and has native power conservation properties. It was built for wireless control and monitoring systems with a focus on low cost devices. Its name is derived from the waggle dance of a honey bee.

● PHY and MAC layers are based on IEEE 802.15.4[26]. ● Throughput of 20-250 Kb/s depending on frequency domain.

● Operates in ISM5_{bands (868 MHz in Europe, 2.4 GHz worldwide, and 915}

MHz in the United States).

● Mesh routing in star, tree and generic mesh topologies. ● Secured with 128 bit symmetric encryption.

● Node range 10-100 m line of sight, depending on transmission power.

5_{The ISM (industrial, scientific and medical) bands are a set of the available radio frequency spectrum}

(24)

3.3.1 Advantages

Carrier frequency

ZigBee is versatile: it can operate in the less contested sub-GHz band if it requires low power consumption or increased range, or it can operate in the more heavily trafficked 2.4 GHz band if it requires a higher data throughput.

Mesh routing makes ZigBee an interesting candidate for industrial automation and sensor networks. The initially short signal range of ZigBee can be enhanced using the mesh routing properties of the standard.

Power requirements

The power consumption of ZigBee is low and it bolsters native power conservation mechanisms to further decrease its consumption.

Throughput

The relatively low throughput of 20-250 Kb/s is sufficient for ZigBee’s intended purposes.

3.3.2 Disadvantages

ZigBee requires a central node to create the network, this node is called a

coordination device. Due to the presence of this node, ZigBee networks contain a single point of failure during creation.

IP addressability

ZigBee is not IP addressable, so to be able to communicate with the rest of the internet, a gateway solution must be used.

Throughput

The low data rate of ZigBee is typical to low power network technologies, but can still be considered a disadvantage.

(25)

when deployed in industrial environments, ZigBee networks are robust and self healing. For home automation purposes, ZigBee is a valid candidate. Its small low cost chips can be embedded in household equipment connecting them to each other. Internet connectivity can although be a problem due to the lack of IP addressability.

3.4 Bluetooth (4 LE and 3 HS)

Bluetooth uses its own protocol stack with L2CAP[28], LMP[29], and SDP[30] as mandatory protocols. Bluetooth HS was designed for Body Area Networks (BAN), networks encompassing the body of the wearer, and can thus be seen in smart healthcare systems and wireless headsets/accessories. Bluetooth LE was designed for low power sensor networks, and therefore features longer signal ranges and lower power consumption.

● Throughput is 1-24 Mb/s for HS, and roughly 200 Kb/s for LE. ● Operates between 2.402GHz and 2.48GHz.

● Long range up to 1Km using Bluetooth LE, 10-30m using Bluetooth HS. ● Master-slave structure.

● Clock synchronized.

● Adaptive frequency-hopping spread spectrum.

● Dual-mode, Bluetooth LE for small amounts of data, HS for larger amounts. ● 128 bit AES encryption, asymmetric encryption used in pairing devices. Both Bluetooth HS and LE use a master-slave structure star topology, where the master node can be connected to other nodes to form a scatternet. In Bluetooth HS the number of connected slaves is capped to 7. For Bluetooth LE that number is implementation dependent.

3.4.1 Advantages

Carrier frequency

(26)

Bluetooth HS is a body area network (BAN) which does not require long signal coverage. Bluetooth LE is more coverage oriented and has a good signal range of up to 1 Km.

Power requirements

Bluetooth LE has native power consumption and can function at very low power levels.

Throughput

The throughput of Bluetooth HS is very high, which makes it suitable for more

throughput oriented applications. Meanwhile, Bluetooth LE is more centered towards low throughput, low power applications, and has thus a relatively low throughput, more apt for sensor networks.

3.4.2 Disadvantages

Carrier frequency

Due to Bluetooth’s frequency hopping transmission technique it can effectively act as a jammer for non frequency hopping applications operating within the hop spectrum.

Bluetooth HS has a very low signal range of only 10 m.

IP addressability

Neither Bluetooth protocols are IP addressable, which is a disadvantage when communication with the rest of the internet is required.

3.4.3 Is Bluetooth supported on the Raspberry Pi 2?

Yes, with the use of a Bluetooth adapter. On the Raspberry Pi 3 there is a Bluetooth chip built in.

(27)

extended range and improved power saving mechanisms. The lack of IP addressability should however be considered when implementing Bluetooth solutions.

3.5 IEEE 802.11n Wi-fi

IEEE 802.11n is a standard in the IEEE 802.11 family of protocols. It uses the TCP/IP stack for upper level communications and is perhaps the most widely deployed wireless LAN technology in the world.

● Throughput: 1-600 Mb/s.

● Operates in 2.4 GHz or 5 GHz ISM bands.

● Range: 10-30m but depends on antenna configuration and transmitting power. ● Secured with WPA2 security suite.

3.5.1 Advantages

Carrier frequency

802.11n has a series of different channels in which data can be transmitted. Upon deployment, a 802.11n will sense which channels are less trafficked, and position itself on one of these channels. This is advantageous because 802.11n will auto configure depending on spectrum occupation.

IP addressability

802.11n is IP addressable and can thus function with the rest of the internet.

Throughput

802.11n also bolsters a good data throughput of up to 600Mb/s which is good for data intensive applications such as multimedia.

3.5.2 Disadvantages

Carrier frequency

802.11n’s channel assignment functions, that works as an advantage when

deploying, can also be a disadvantage if deployed in a heavily trafficked area, or in the vicinity of strong frequency hopping transmitters.

(28)

Power requirements

802.11n does not have any native power conservation mechanisms. This fact makes 802.11n a poor choice for low power networks with low data requirements.

3.5.3 Is 802.11n supported on the Raspberry Pi 2?

Yes, if using a compatible 802.11n wifi adapter. On Raspberry Pi 3 there is a built in 802.11n chip.

Due to its high throughput 802.11n can be used for home networks and multimedia networks. In addition to these applications, 802.11 can be deployed in workplace networks due to its WPA2 security suite and acceptable signal range. The same signal range also makes it a viable option for home automation purposes where power requirements are not critical.

3.5.5 802.11ah

802.11ah[3], also called Wifi HaLow is an upcoming IoT focused addition to the 802.11 family of protocols. 802.11ah aims to extend functionality to the IoT domain and address shortcomings in the regular 802.11 protocols which makes them

unsuitable for IoT communications. Main features in 802.11ah include effective sleep and transmission scheduling, also called TIM and page segmentation, support for over 8000 connected entities and transmission ranges up to 1 Km. 802.11ah is still in early deployment, and not many chips that support it exist. 802.11ah is not yet supported on the Raspberry Pi 2.

3.6 Discussion

I discovered that the majority of the protocols investigated were designed for low power sensor networks, thus throughput and power consumption were low. The only two that were possible to stream audio on was Bluetooth and 802.11n due to their higher throughput. At a link layer perspective both these technologies looked equally attractive, apart from some MAC layer differences.

(29)

synchronization is an important factor, as discussed in Chapter 2.5. In this aspect Bluetooth has the advantage since it has a synchronized protocol stack. In

Bluetooth, slave devices sync their clocks to the master device on regular intervals, thereby effectively minimizing clock skew. 802.11n utilizes a random access scheme which provides no clock synchronization and no guaranteed latency. A fixed

assignment access scheme like the one used in Bluetooth provides this. The other more sensor oriented protocols, ZigBee, Z-Wave, and Thread, all supported multi-hop routing which is attractive from an IoT perspective since you often have walls and other obstructions between the sender and the receiver. They also presented good power characteristics, all operating in sub-GHz frequencies and utilizing a sleep/wake scheme to conserve battery power. Thread is IP addressable, which Z-Wave and ZigBee are not. Thread can thus be more easily connected to the rest of the internet.

3.7 Summary

(30)

4. Project Design

In this Chapter I present a hardware and software base recommendation along with lower level networking design choices and general requirements for the streaming application in Section 4.1. These design decisions are based on the findings of the literature study presented in Chapter 3, and on previously acquired knowledge. Thereafter I present the general design idea of the wireless speaker system in Section 4.2 to give the reader a simple and easy to grasp image of the system design. I then delve deeper into the design and present detailed design decisions in Section 4.3. Section 4.3 will also prepare the reader for the contents of the

implementation details in Chapter 5.

4.1 IoT Base and Streaming application

In this Chapter I will first make recommendations for a hardware and software base in Section 4.1.1. I will then present the requirements for the streaming application in Section 4.1.2. A discussion on design decisions for the networking components of the streaming application will finalize this Section in Section 4.1.2

4.1.1 Hardware and Software Base recommendation

(31)

Figure 4.1 Raspberry Pi 2 Single card computer about the same size as a credit card. By Eben Upton - Own work www . raspberrypi . org

The Raspberry Pi 2 also supports a wide variety of different operating systems, ranging from Raspbian[33], which is the standard linux based distribution for the platform, to Windows 10 IoT Core[34] and Ubuntu Snappy[35]. This made it even more attractive as a base since it can be loaded with different operating systems suited for different tasks and environments.

The disadvantage of using a Raspberry Pi 2 would be its high power consumption. A standard Raspberry Pi 2 Model B needs around 2A to function properly, which is considerably more than other, more specific purpose chips.

As a software base I recommend Raspbian Jessie6_{for general purpose operations}

due to its compatibility with the base hardware, which is second to none. For more specific purpose tasks, more specialized software base operating systems could exist. I did consider using Ubuntu Snappy as a software base, but the quality of the available documentation was subpar which made me switch to Raspbian.

(32)

4.1.2 Streaming Application requirements

The requirements and priorities of the audio streaming application was formalized by me and my supervisor at Altran. They are as follows:

1. Structure 1. Speaker 2. Controller 2. Communication 1. Unicast 2. Broadcast 3. Identification (bonding) 1. UDP broadcast

2. Extract sender information 3. Initialize TCP session 4. Power on

1. Connect to Wi-Fi automatically 2. Start program automatically

1.Cannot find Controller, shut down 5. Streaming 1. Stream control 2. Stream buffer 3. Stream data 6. Audio playout 1. Single Speaker 2. Multiple Speakers 7. Synchronization Structure (Mandatory)

The system is to be based around two node types: Speakers and Controllers, each type running the Speaker and the Controller applications respectively.

The Speaker node is responsible receiving audio streams from the Controller, buffer those, and later play them out via some form of audio port.

(33)

user. It will on receiving such a stream set up individual streams to all known Speaker nodes.

Communication (Mandatory)

The communication requirement states that the Controller and Speaker applications must be able to communicate via unicast and broadcast channels. The first priority is to implement standard unicast communication. Once that is working, broadcast communication should be implemented.

Identification (Mandatory)

Identification states that the Controller must be able to find and identify Speakers, and the Speakers must be able to find and identify the Controller. Since neither node knows the address of the other one, a broadcast must be used. Thus, the first step is to implement a UDP broadcast originating from the Speaker. Once the broadcast is working, the Controller needs to be able to extract the sender information from received broadcasts. Once this is working, the Controller need functionality to initiate a TCP bonding session with the Speaker, which is the final identification requirement.

Automatic execution (Mandatory)

Automatic execution means that the devices must be able to boot up and begin operations without any user intervention. The device must first automatically connect to the wireless network present in the area. Thereafter, the device must start the Controller or Speaker application, depending on what kind of node the device is. An optional sub-requirement is that a Speaker device must be shut down or restarted if no Controller node can be found.

Streaming (Mandatory)

The streaming requirement means that the Controller device must be able to stream data to the Speaker devices. First, control messaging is to be implemented in order to set the stream up. Next, a stream buffer at the Speaker device should be

implemented to allow for smooth playout and also allow for the development and testing of the final requirement, the data stream. The data stream is to be

(34)

Audio playout (Optional)

Audio playout is an optional requirement since the time allotted might not be enough to implement audio playout functionality. The audio playout requirement states that the Controller application first should be able to stream to a single Speaker node and that node should be able to play the received data. Once that functionality was working, the scope should be increased to incorporate streams to multiple Speakers.

Synchronization (Optional)

Synchronization is also an optional requirement since the ability to test it largely depends on requirement 5 audio playout. The synchronization requirement states that the applications on the Speaker and Controller nodes must be synched within 10 ms. This requirements comes from the fact that audio played too much out of sync on multiple Speakers will get distorted and will be ill perceived by the user.

4.1.3 Streaming Application Networking Details

During my literature study I found a fair amount of IoT capable protocols specifically designed for low power sensor networks, but not many existed for the purpose of audio streaming over a wireless link. The ones that did were Bluetooth and 802.11n, or Wi-fi as most call it.

MAC Layer

I looked at the protocol stacks of the two throughput oriented protocols 802.11n and Bluetooth HS. 802.11n use the TCP/IP stack, which I have experience working with. Bluetooth HS use its own protocol stack which I have no experience working with. When comparing Bluetooth HS to 802.11n I found that Bluetooth has a much better battery profile. One paper[7] even estimated a battery lifetime of 12-14 years when the device used were only powered by a coin cell battery. Additionally, Bluetooth solves the synchronization problems of multimedia streaming with its fixed

(35)

more widespread deployment. All these properties made 802.11n the better candidate to base my protocol on.

The decision was therefore to choose 802.11n for my project since it allowed for higher bit rates than Bluetooth, albeit lacked effective MAC layer synchronization, and finally had a lesser working complexity which would allow for a greater amount of work to be put into the actual project.

Physical Layer

802.11n operating at 2.4 GHz was selected as the physical layer. This was partly due to the already widespread deployment of 802.11n, but also due to more technical reasons. If you are to use 802.11n for transmitting audio, why not use the 5GHz band? This is a good question. Due to the higher frequency, the amount of information that can be encoded increase, thus increasing throughput. An

organization called WISA actually used the 5GHz band for audio streaming in their proposed standard[36]. They used the 5GHz band to transmit uncompressed audio data to wireless Speakers and it worked fine.

The downside with the 5 GHz band is that it requires more energy than a 2.4 GHz transmitter and that signals don’t propagate very well, in accordance with Friis transmission equation illustrated in Equation 1. Thus, I decided to use the 2.4 GHz band. Since I will be transmitting encoded and compressed audio, the bandwidth will be sufficient.

Network and Transport Layers

Since I will be using 802.11n at 2.4GHz I will use the standard TCP/IP stack implemented in the Linux kernel for the network and transport layers.

4.1.4 Hardware Details

I choose to develop my streaming application on the recommended hardware base due to experience working with the base.

(36)

power and are therefore hard to power on batteries alone. However, with the future development of better batteries this might change. I will however not focus on the power aspects more than keeping support for lower power solutions open.

Communications chip

As 802.11n was chosen, the communications chip only needs to be able to communicate in 802.11n Wi-Fi and be compatible with the Raspberry Pi 2. The official Raspberry Pi WiPi Wifi adapter was a good choice.

About a month into the project the Raspberry Pi 3 was released. One of its main features was a built in 802.11n wireless chip. I purchased and tested a unit and it was compatible with the code I had written. It is therefore possible to run the code on both Raspberry Pi 2 and 3 systems.

4.2 Design Overview

In this Section I will present the general design of the network architecture in Section 4.2.1 and of the streaming application in Sections 4.2.2, 4.2.3, and 4.2.4.

(37)

The design of the network, as depicted in Figure 4.2, revolves around two types of nodes; Speakers and Controllers. Speakers will be covered in detail in Sections 4.2.3 and 4.3.2, while Controller nodes will be more thoroughly explained in Sections 4.2.2 and 4.3.1. The nodes in the network are connected via 802.11n Wi-Fi links from a central infrastructure node, which is a standard 802.11n 2.4 GHz router. The system works as such that all the nodes automatically associate with the router. When the Controller node is online it will associate with all the Speakers in a process dubbed “bonding”.

When the user wants to stream audio contents to the Speakers he/she sends an audio stream to the Controller node via the Router. When the Controller node receives such a stream, it contacts each bonded Speaker and begin a stream session to each.

4.2.2 The Controller

(38)

The role of the Controller node, depicted in figure 4.3, is to receive instructions from the user and to stream content to the various Speaker nodes, which are depicted in Figure 4.4. Additionally, the Controller holds track of the identities of the Speaker nodes. The Controller application running within the Controller node consists of a few different modules listed below.

Stream Listens role is to passively listen for multimedia- and control data from the streaming application at the user.

C_Bonding is the module responsible for listening and bonding with the Speaker nodes using the bonding protocol illustrated in Figure 4.5.

The C_Software module is responsible for establishing reliable unicast connections to the bonded Speaker nodes in which multimedia data and control messages can flow. It also maintains a list of bonded Speakers.

4.2.3 The Speaker

Figure 4.4 Overview of the Speaker node

(39)

The S_Bonding module is the Speaker equivalent of C_Bonding in the Controller. Its role is to facilitate the bonding process. On activation, it broadcast a query for nearby Controllers. It will continue querying for Controllers until it either finds one, or powers down due to not bonding with any Controller for a set amount of time.

S_Software is the part of the Speaker node responsible for receiving the multimedia and control data from the Controller.

4.2.4 Bonding Protocol

Figure 4.5 Overview of the bonding protocol

Figure 4.5 illustrates the bonding protocol responsible for bonding a Speaker to a Controller. The Controller sends a “c” back due to the contents of the message being unimportant. The Speaker only need to receive something from the Controller in order to continue the bonding process.

In the bonding protocol the Speaker broadcasts a query for Controllers on startup. Controller nodes have a dedicated module listening for these queries. Once the Controller hears a request for a Controller it will reply with its networking information. The Speaker can then choose to bond with this Controller, on which it sends a

(40)

Speaker states that it do not wish to bond, the Controller will close the connection and choose not to reply to further requests from that particular Speaker.

This bonding protocol assumes that a proper security suite is active on the network. It has no method of authenticating Speaker and Controller nodes and could therefore be classed as insecure. Further work on this project should add some method of authentication between the Controller and Speaker, possibly also encryption.

4.3 Detailed Design

In this Section I will describe detailed design choices for the Controller node in Section 4.3.1. I will then describe the same design decisions for the speaker node in Section 4.3.2.

4.3.1 Controller software details

Figure 4.6 Controller application module overview

The Controller application consists of several different software modules written in C, as illustrated in figure 4.6. It has a C_Bonding module for handling Speaker bonding, and a C_Software module for handling data streams to the Speakers. These

(41)

C_Bonding

Figure 4.7 Overview of the C_Bonding software module

The C_Bonding module, illustrated in figure 4.7 contains a C_Broadcast module and a C_Unicast module in addition to its own functionality. C_Bonding utilizes

functionality from both C_Broadcast and C_Unicast but also extends that functionality with its own features. These features include maintaining a list of bonded Speakers, and creating and managing the various bonding threads.

C_Broadcast is responsible for listening at a predefined socket for Controller queries originating from Speakers. Once a Speaker broadcast is received, C_Broadcast will extract the source IP address of the broadcast and return it to the parent C_Bonding module.

(42)

Figure 4.8 Controller bonding finite state automaton

The Controller bonding finite state automaton starts in the Contacted state. In this state the Controller have received a Speaker broadcast and sends a reply via unicast. The Controller immediately enters the wait_for_speaker_reply state after a reply has been sent. In the wait_for_speaker_reply state the Controller begins to listen for a reply from the Speaker. If a positive reply is received, the Controller accommodates the Speaker in its bonded Speakers list and sends an ACK to the Speaker in order to indicate to the Speaker that it has been properly received. The Controller then moves to the wait for ACKACK state. If a negative reply is given, the Controller exits the state machine and closes the connection. In the wait for ACKACK state the Controller listens for the tcp ACK of the plain text ACK transmitted. When the TCP ACK is received the Speaker is considered bonded. The Controller enters the bonded state and exits.

(43)

C_Software

Figure 4.9 Controller C_software module

C_software, illustrated in figure 4.9, contains two modules: C_stream_control and C_stream_data. As in previous modules, C_software utilizes functions from both C_stream_control and C_stream_data but also extends the functionality with its own features. C_stream_control is responsible for a control connection to the Speaker. The messages exchanged in this control connection is feedback from the Speaker. Depending on what kind of feedback is received from the Speaker, the

C_stream_control module can choose to increase or decrease the sending rate of the audio stream. The purpose of C_stream_data is to implement data transfer with adjustable rates. The C_software module also maintains a list of bonded Speakers which it will use when establishing control and data connections to bonded Speaker nodes.

Since the Controller node requires a high degree of parallelization, C_software needs to be of a multi-threaded nature, just like C_bonding. Furthermore,

(44)

Figure 4.10 Controller stream control finite state automaton

The Controller stream control finite state automaton describes the process of initializing and performing a streaming session to a Speaker node. The Controller starts in the waiting_for_audio_stream state. When an audio stream is received, the Controller transitions to the contact_speakers state and sends a “prepare for data” message to the Speakers. The Controller then waits for all the Speakers to reply with a “prepared” message before it starts the stream.

To initialize the stream, the Controller must establish a data connection between itself and the bonded Speaker nodes. First, a control session must be established. Once that has happened, a data channel is created to facilitate the stream.

(45)

4.3.2 Speaker Software details

Figure 4.11 Speaker application module overview

The Speaker application, as its Controller counterpart, also have various software modules that implements its functionality. These modules are depicted in Figure 4.11. The design of these modules will be described in detail in this Section.

S_Bonding

Figure 4.12 Overview of the S_Bonding software module

S_Bonding depicted in Figure 4.12 is the module responsible for bonding the

(46)

Along with the S_Broadcast module, S_Bonding also has an S_Unicast module to handle unicast bonding communications. S_Unicast also implements the Speaker variant of the bonding protocol using a finite state automaton. This automaton is depicted in Figure 4.13.

The shutdown module is responsible for shutting the platform down if no Controller can be found.

Figure 4.13 Speaker bonding finite state automaton

The Speaker originally starts in the Listen state. In the Listen state the Speaker passively waits for a Controller to establish a TCP connection with it. When a connection is established, the Speaker proceeds to the Contacted state. In the Contacted state, the Speaker first extracts the sender information from the message it received in the listening state. Once the sender identity has been determined, the Speaker proceeds with sending a positive reply if it wishes to bond, or a negative if it does not wish to bond. If a positive reply is sent, the Speaker traverses to the

wait_for_ACK state. If a negative reply is sent, the Speaker enters the listening state again. In the wait_for_ACK state, the Speaker wait for an ACK sent from the

(47)

S_Software

Figure 4.14 Speaker S_software module

S_software, depicted in figure 4.14, is the Speaker equivalent of C_Software. Its purpose is to implement streaming and control functionality at the Speaker's end. S_software includes a set of sub-modules which it uses to extend its functionality. These modules are s_stream_control, s_stream_data, and s_buffer_alloc.

s_stream_control implements control message functionality for the data stream. These control messages include user control messages such as start stream, stop stream, and pause stream, but also system control messages such as increase sending rate and decrease sending rate. The system control messages are used to implement adaptive streaming functionality. A dedicated stream data channel to handle the data stream is an advantage since it will allow for out-of-band control messaging.

s_stream_data implements the data stream of the Speaker application. This is where the audio data is received and unpacked from the network. This module also has a reference to the s_buffer_alloc module described shortly.

(48)

Streaming control is implemented as a finite state automaton, which is illustrated in figure 4.15.

Figure 4.15 Speaker stream control finite state automaton

(49)

4.4 Summary

In this Chapter I have provided a detailed design discussion regarding the choice of the hardware and software base, and choice of communication protocol based on the findings of the literature study described in Chapter 3. Thereafter, I describe the general application software and network topology designs in order to introduce the general idea behind the system for the user. Once the reader has grasped the design overview, I describe the detailed design decisions of the Controller and Speaker nodes software components. The purpose of the detailed design Section is to prepare the reader for the contents of the implementation Chapter. A good

(50)

5. Project Implementation

In this Chapter I have tried to mirror the layout of Chapter 4 as closely as possible. Therefore, I start by discussing the general overview of the implementation in

Section 5.1. Once the general implementation has been explained, I discuss detailed implementation choices in Section 5.2.

5.1 Implementation Overview

In this Section I discuss the streaming application implementation from an overview perspective. First, I briefly discuss the network implementation in Section 5.1.1. Then, I discuss the general Controller implementation in Section 5.1.2 and the Speaker implementation in Section 5.1.3.

5.1.1 Network implementation overview

The basic network functionality is implemented using the built in TCP/IP stack in Linux. Both node types use the Linux TCP/UDP socket implementations for communicating. DHCP is used for addressing within the local network.

Wpa_supplicant, a system daemon built into Linux, is used to establish WPA2 protected session with the router. IEEE 802.11n wifi adapters are used to provide lower level connectivity.

5.1.2 Controller implementation overview

The Controller is implemented in C using a hierarchical module based approach. The root module is the Controller module, which in turn include functionality from the C_Bonding and C_Software modules. As you go down the levels of the hierarchy the functionality becomes more and more basic. For example: c_broadcast can only set up a broadcast socket and listen to it, whilst c_bonding, c_broadcast’s parent

module, can perform more complex tasks such as coordinating thread and Speaker resources.

Since the role of the Controller is to identify and coordinate multiple Speakers, multi-threaded functionality was critical. With the help of the POSIX standard thread, the pthread, the Controller application can handle multiple Speaker bondings

(51)

Concepts from the functional programming paradigm was borrowed when

developing the program, such as having deterministic functions only dependent on the input parameter instead of mutation functions operating on global data. Since POSIX threads have separate stacks this decision have simplified the thread synchronization process. There was, however, some situations in which global variables were required, but these have been made thread safe by only allowing one thread to work on the same piece of data.

5.1.3 Speaker implementation overview

The Speaker is also implemented using C, in a similar hierarchical manner as the Controller. The root module is the Speaker, which includes functionality from the Speaker counterparts of C_bonding and C_software, called S_bonding and S_software.

The Speaker node is a lot more simple than the Controller since the Speaker do not need to keep track of all the other Speaker nodes. It only needs to know the identity of the Controller node and the router to function properly. Therefore, the Speaker node could be much more sequentially programmed then the Controller. The

Speaker only requires a couple of non colliding threads to function. These include a broadcast thread, a bonding thread, a stream control thread, and a stream data thread.

5.1.4 Bonding protocol implementation overview

The bonding protocol can be divided into two parts: a unicast part, and a broadcast part. The broadcast part is relatively straightforward; Speakers broadcasts their identity using a specified broadcast thread. Meanwhile, the Controller listens for broadcast messages. Once the Controller hears such a message, it creates a dedicated unicast thread for that Speaker node and the bonding shifts to the unicast part.

(52)

In the Speaker node, the broadcast thread is active during the bonding, but is killed by the unicast thread once a successful bonding has been performed. The simple reason for this is that the Speaker is bonded, it knows the identity of the local Controller node. Therefore, it does not need to broadcast any more queries.

5.2 Implementation Details

In this Section I will describe detailed implementation choices for the Controller application in Section 5.2.1 and for the Speaker application in Section 5.2.2. 5.2.1 Controller details

The Controller's function is to coordinate and synchronize multiple Speakers nodes. First it need to identify the Speaker nodes present in the network and associate with them.

The Controller must also listen for user input in the form of a data stream containing audio data. The Controller node needs to implement stream multiplexing functionality in order to split the single user data stream into multiple identical streams for the Speakers.

Bonding

The Controller’s first duty is to determine the identity of the Speaker nodes. This is done using the bonding protocol I designed, see figure 4.5. The bonding protocol is implemented as a broadcast thread which will listen for broadcast messages on the specified broadcast port 9115, and a series of unicast threads which will be created by the broadcast thread when a Speaker broadcast is received. This unicast thread will then establish a TCP unicast session with the Speaker at the specified unicast port, 9104, and continue with the unicast part of the bonding protocol, described in figure 4.8.

(53)

Speaker list

The Controller also maintains a list of bonded Speakers. This list is referred to as the Speaker list. Two versions of this list exists, one that the bonding module use for bonding purposes, and one that the streaming software module use for stream connections. The bonding Speaker list is implemented using a static array of length MAX_SPEAKERS and can be seen in figure 5.1.

Figure 5.1 Bonding Speaker list implementation

The streaming Speaker list is similar, but must also support additions while a stream is ongoing, thus the decision was made to implement it as a linked list.

The linked list is implemented using structs and struct pointers, as can be seen in figure 5.2.

(54)

Once a Speaker is bonded, its IP address is copied into the s_address field of the speaker_thread_relation structs located in the speaker_list_element list. In order to copy the addresses, an element must first be created in the Speakers list. Since element addition and removal is easy in linked lists, adding a speaker to the list simply consists of appending a node to the list tail, which is the last element of the list, making the added node the new tail.

The speaker_thread_relation struct bares further explanation since on first glance no thread reference exists. The idea behind the speaker_thread_relation is that each control thread, which is described in the Multithreading Section, is passed a

reference to one element in the speaker_list linked list on creation (See figure 5.3).

Figure 5.3 Control thread creation. Temp_thread is a reference to a control thread linked list element, and temp_speaker is a reference to a speaker_list_element in the speaker linked

list.

The control thread then creates a local pointer reference to that specific speaker_list element. From this reference the Speaker address can be extracted, along with references to the socket file descriptors for the data and control thread. The struct containing the connection details for data and control connections are also kept separate by these references. This solution effectively makes it so that each thread is unaware of any of the Speaker linked list elements other than the one it has been allotted. Since each element in the Speaker linked list lives in its own separate space of the memory, collisions should not occur.

Multithreading

Since the Controller will handle multiple Speakers in parallel, the capacity must exist to handle a set of parallel connections simultaneously. I used the pthread library to implement such functionality. The threads can be divided into two major types: bonding threads and streaming threads. The bonding threads consists of the

(55)

Broadcast and Unicast threads

The bonding protocol requires the Controller to perform two things at the same time. It must always listen for Speaker broadcasts, and at the same time perform Speaker bondings. This is the purpose of the two bonding thread subtypes. The broadcast thread will perform the listening operation. Once it hears a broadcast message, it will create a unicast thread to perform the actual bonding procedure. The unicast thread will then exit when the bonding procedure is complete.

In the current implementation, there exist only one broadcast thread, but multiple unicast threads. Unicast threads get dynamically created by the broadcast thread every time a broadcast is heard from a Speaker that is not bonded. This process is illustrated in figures 5.4 and 5.5.

Figure 5.4 Broadcast thread main listening loop. listen_for_speaker is a blocking function which will only return once a Speaker broadcast has been heard.

Figure 5.5 initialize_bonding function. It will copy the address given by the broadcast main loop, store it in a thread_sock_binding struct described in figure 5.1, and create the unicast

(56)

Data and Control threads

The Controller software module’s streaming requires two types of threads; data threads and control threads. Data threads are responsible for sending the audio data to its corresponding Speaker while the control threads set up and manages the stream of its allotted Speaker.

It should be noted that I made the design choice to allocate one set of thread pairs (control and data) per Speaker connected. I based this design decision on the fact that it would be easier to implement thread coordination and isolation, and to increase the amount of parallelism in the stream multiplexing.

The control threads are created when a user audio stream is detected, or in the current build’s case, simulated. The creation process can be seen in figure 5.3. Data threads on the other hand are created by the control threads after a control session has been established. This is to give the Speakers time to set up its data thread socket, which must be up and listening by the time the Controller attempts to establish a data connection.

5.2.2 Speaker details

The Speaker's primary purpose is to implement audio buffering and playout, but also to initialize bonding with a Controller node.The Speaker node must therefore, along with bonding functionality, implement some sort of stream buffer and an adequate control messaging structure.

Bonding procedure

The first duty of the Speaker node is to set up a broadcast socket and start

(57)

state. Since the Controller node will only attempt a unicast connection once it has received a Speaker broadcast, I solved the race condition by introducing a mutex lock, see figures 5.6, 5.7.

Figure 5.6 Mutex lock preventing race conditions between Speaker and Controller nodes

Figure 5.7 Unicast thread mutex signalling to allow the broadcast thread to start The unicast thread will proceed to setup the socket for listening. Once the setup is complete it will signal the bonding condition and unlock the bonding lock. This will allow the main thread to create and start the broadcast thread thus ensuring that there will be no race conditions between the Controller and Speaker nodes bonding components.

The stream buffer

Audio streaming on top of 802.11n in an IoT context