Voice over IP for Sony Ericsson Cellular Phones

(1)

Master Thesis

Software Engineering Thesis no: MSE-2005:16 October 2005

Voice over IP for Sony Ericsson Cellular

Phones

Petter Theander, Thomas Hultgren

School of Engineering

Blekinge Institute of Technology Box 520

(2)

This thesis is submitted to the School of Engineering at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Software Engineering. The thesis is equivalent to 2 x 20 weeks of full time studies.

Contact Information: Author(s): Petter Theander E-mail: di00pth@student.bth.se Thomas Hultgren E-mail: di00thu@student.bth.se External advisor(s): Tobias Åkesson

Company/Organisation: Sony Ericsson Mobile Communications AB Address: Nya Vattentornet, SE - 221 83 Lund

Phone: +46 46 193 986 Pär Olsson

Company/Organisation: Sony Ericsson Mobile Communications AB Address: Nya Vattentornet, SE - 221 83 Lund

Phone: +46 46 212 67 03

University advisor(s): Håkan Grahn

School of Engineering, BTH

School of Engineering Internet : www.bth.se/tek

Blekinge Institute of Technology Phone : +46 457 38 50 00

Box 520 Fax : +46 457 271 25

(3)

A

BSTRACT

(4)

Introduction

This master thesis work was undertaken to investigate the possibilities of introducing a new communication technology into an already established communication interface. As new com-munication technologies are emerging more rapidly today than a couple of years ago, the need to merge these is also becoming greater. The general trend amongst emerging technologies is that they are more or less exclusively developed to fulfill the needs of voice communication in an IP-based packet-switched network as the Internet. Such technologies are commonly known as Voice over IP (VoIP). Traditional telephony technologies, like the Public Switched Telephony Network (PSTN), were on the other hand designed to work in circuit-switched networks.

The motivation for undertaking this investigative work was that we saw a general disappoint-ment of the fact that a new communication technology often meant that one, as a user, were forced to use a computer without any other really good alternatives. Thus, there was a need for a solution that made it possible to use the emerging technologies in a more comfortable way, as for example through a cellular phone.

To us, this lacking was a major drawback, and probably one of the facts that imposes a prob-lem when introducing a new communication technology. It was these facts that led to the initial solution proposal presented in chapter 3. This proposal was sent to Sony Ericsson Mobile Com-munication (SEMC), and earned us the opportunity to undertake more extensive research of what is actually needed in order to introduce support for a new communication technology in a cellular phone.

This report presents an investigation of the possibilities for introducing a new communication technology, like VoIP, into a Sony Ericsson cellular phone. The investigation is based on the following research questions:

1. Will Bluetooth be able to handle the communication between the cellular phone and the base unit in accordance to what is seen as "normal" response times and quality in traditional telephony?

2. Is it possible to integrate IP-telephony support into a cellular phone based on the Sony Ericsson architecture?

3. Is it possible to use any pre-existing techniques from the Sony Ericsson mobile phone ar-chitecture in order to ease the implementation?

4. Is it possible to integrate support for more communication technologies based on the se-lected communication protocols and the Sony Ericsson mobile phone architecture?

(7)

Chapter 2

A Need For New Communication

Technologies

In order to understand why new voice communication technologies are introduced, when there in fact already exists a working and well accepted system, one must understand the main dif-ferences between the traditional Public Switched Telephony Network (PSTN), which is circuit-switched, and the new IP-based technologies which are used in packet-switched networks. Due to this reason there will be a short summary of the most important aspects of both circuit-switched networks and packet-switched networks, along with their respective benefits and drawbacks.

2.1 Circuit-switched Networks

There exist different types of circuit-switched networks. The first, and probably the simplest one, is a dedicated cable between two users. This system is however not very flexible when it comes to adding more users, as each user would need a dedicated cable to every other user. This would in fact mean that the number of cables in the network would grow exponentially [1]. To solve this issue a switch could be introduced. This means that adding a new user only implies connecting the new user to the switch. In the simplest case one could say that the task of the switch is to form a connection between two users, and in this way attach the two, as if they were actually connected to the same dedicated cable [1].

Although simplified, this is the main concept of a circuit-switched network; the network sim-ply allocates resources along a path, between two or more end users, to form a dedicated line [1]. Over the years this paradigm has of course been refined and developed. Today’s circuit-switched networks uses, e.g., Frequency Division Multiplexing (FDM), Digital transmissions, and Time Division Multiplexing, to better utilize the capacity of its bearer (cables) [1]. The main task of the circuit-switch telephony network is still the same, i.e., to manage and setup dedicated paths and resources between end users, without any care being taken to what is actually being sent over the connection. This means that much of the intelligence in the system resides in the network, as it is the network that decides how to setup the path and manage the path throughout an entire call-session [1].

2.2 Packet-switched Networks

Packet-switched networks were designed with focus on data transmission, i.e., with care taken to the bursty nature (the amount of data sent during a session is not constant) of data transmissions [1].

In a packet-switched network a packet of data is created by one node in the network, and the address of the receiver is attached to the packet. The packet is then sent to the first network node, or router as it is called in packet-switched networks. The packet’s address field is examined by the router and passed on to the next appropriate node on the network. When the packet arrives at its destination the data in the packet is processed [1]. One could say that a packet-switched network operates in a very similar way as the traditional postal service.

(8)

no resources will be wasted when sending bursty data, as is the case in circuit-switched networks. The fact that most packet-switched networks do not offer any QoS means that a client using the network can not assume that a sent packet actually is received by the recipient. It therefore becomes the client’s responsibility to handle the QoS aspects of a session [1]. This is however only true when using UDP and not TCP, as TCP adds transport control functionality to handle these issues.

2.3 The Internet

The Internet was designed as a dumb network which soul purpose is to provide connectivity between senders and receivers, no matter what type of data is carried [1]. Internet is constructed as a packet-switched network with the Internet Protocol (IP) as its base for addressing and routing. Therefore the structure of Internet is independent on the actual bearer of the data, as long as the endpoints of each network support the IP paradigm.

As the Internet is a dumb network, and only provides unreliable transmissions, it is left to the sender and receiver of the data to handle retransmissions, flow control, error detection, etc. The network (Internet) itself is almost stateless and does not care for the arrival of the packets sent [1]. This very fact makes the network itself very failure safe, as if one node in the network malfunctions, this is only perceived by the receiver as a loss of packets, and a resend can be issued. The packets can this time take another way through the network [1]. This is a great step away from what is seen as normal conditions in traditional circuit-switched networks, as the PSTN, where QoS is central. However, as the utilization of resources is better in a packet-switched network, and the fact that the Internet has grown so large, along with the fact that its more or less free to use, has led to that voice communication is shifting towards solutions for packet-switched networks [1].

(9)

Chapter 3

The Initial Idea

3.1 Background

In this chapter the initial idea will be presented. This idea was used as reference material when we applied for a master thesis project at SEMC. As stated, this is the initial idea, and as can be seen throughout this report there will be adaptations and modifications to it. Why there is deviation from this initial idea is quite natural, as the idea presented in this section was not derived from any pre-study, but rather out of creative thinking and logical reasoning. In short, it was quite clear to us from the very start that this material would mostly be used as a means of describing one potential solution to implement IP-telephony in cellular phones. This means that this idea was derived without any insight on what possibilities were available in the SEMC architecture. For now we will leave it at this, and describe the idea which earned us a position within SEMC to investigate the true possibilities for IP-telephony within their architecture.

The source for the idea was that we felt dissatisfied with the fact that one was more or less forced to either buy a new phone or get stuck in front of a computer, if one should use a new communication technology, like for instance IP-telephony. This of course leads to that one has to change phone dependent on which communication technology one would like to use. The fact that a new communications technology imposes the need to use new physical equipment is in our opinion one of the main obstacles when introducing new technologies, as people are often reluctant to change their behavioral patterns [bok].

3.2 Vision

To address the problems described above, we conclude that it would be a good idea to gather all communication technologies under one physical interface. In order to overcome the problem with people’s reluctance to change, it was decided that a cellular phone could be a good hardware interface for all different technologies. This decision was based on that the cellular phone is already a well accepted way to handle communication, both voice and video. It also has the advantage, compared to other solutions, that it is mobile. This means that one would always have the choice to choose freely among the supported communication technologies, independent of the physical location.

The freedom to choose communication technology and the possibility to fairly easy support new technologies, without changing the physical equipment, would also lead to economical ben-efits. This would be true for both companies and home users, as they can easily shift to the most cost effective communication technology. The greatest economical gains would of course be for large companies, due to the larger traffic volumes.

(10)

3.3 The Basic Idea

The general idea, which can be seen in figure 3.1, revolves around a cellular phone (1), which is connected via Bluetooth (2) to a base unit (3), which in turn is connected to an appropriate bearer for that specific media type (4).

Figure 3.1: An overview of the basic idea

3.3.1 Making an Outgoing Call

If one looks at the flow when making a call using this solution it would mean that the cellular phone first checks to see if it is within coverage of the base unit. If it does not have coverage it initiates the call as a normal call for a cellular phone, i.e., using GSM, UMTS, etc. In the case that the cellular phone does have coverage from a base unit, it passes the connection information to the base unit, which in turn selects the most appropriate bearer, i.e., based on the connection information given. The base unit then sets up the call between the cellular phone and the intended recipient.

3.3.2 Handling Incoming Calls

When the base unit receives an incoming call, on one of the connected bearers, one of the following things can happen: If the cellular phone has coverage by the base unit, the base unit sets up the call with the specific cellular phone. If the cellular phone however would not be within the coverage area of the base unit, the call could for instance be connected to the reception or forwarded to, e.g., an answering machine.

3.4 Technical Requirements

(11)

3.4.1 The Cellular Phone

The main requirement for the cellular phone, in this solution, is that it has Bluetooth capa-bilities. This is quite natural as Bluetooth is the bearer for all data traffic between the cellular phone and the base unit. However, the exact Bluetooth requirements are not fixed. There are some alternative ways to solve the actual data transfer over Bluetooth. One of these is to let the cellular phone implement the Bluetooth profile normally used for headsets. This solution means that the base unit can communicate with the cellular phone using the same standard as it were just sending audio to an ordinary headset. This solution would however also require that the cellular phone is able to communicate the connection information using one of the Bluetooth profiles for data communication. The second alternative is to simply handle all communication, i.e., control information and voice packets using normal data communication and not separating the two. Ex-cept for the requirement already mentioned there will of course also be requirements for codec support, coverage handling, etc.

3.4.2 The Base Unit

The base unit could almost be seen as a router between different bearers and communication technologies. This means that the main purpose of the base unit is to redirect and repack the data received. This further means that there are real-time requirements when handling these packets if not to introduce unacceptable delays. The handling and repacking of voice data must also be done without any noticeable loss of sound quality.

In order to make the base unit as flexible as possible, a modular design is suggested. This will mean that the base unit could support new communication technologies just by adding a software module. Figure 3.2 describes the module-based base unit.

Figure 3.2: Overview of the module-based base unit

Bluetooth Interface. This part of the base unit represents the communication interface towards the cellular phone, and is used when receiving and sending data. This data could be both control and voice packets.

Packet Handling. This layer is used to filter the incoming packages, which are received on the Bluetooth interface, according to their type, i.e., control- and audio packets. These packets are then forwarded to the appropriate module. The packet handling layer is also responsible for repacking of the data received by the base unit to the correct Bluetooth packet type, before forwarding these to the Bluetooth interface.

Communication Logic. This module is responsible for handling connection logic, i.e., the logic needed for setting up and maintaining the connection between the incoming and outgoing inter-face. This means that it is this module that handles the selection of which bearer to use and manages the connection with the cellular phone. The choice of which bearer to use is based on the connection information given. The intention is to make it possible to manually configure this routing table.

(12)

Bearer Packing. These modules are represented in figure 3.1 as "PSTN", "IP-telephony" and "...". This type of modules are used to repack to and from the intermediate audio format to the format expected by the specific bearer. This means that it is these modules that decide which communication technologies and protocols that are supported. The intention is to make this mod-ule layer easy to expand, and thereby introduce support for new technologies. It should also be mentioned that care must be taken when choosing the intermediate format, in order to maintain flexibility.

(13)

Chapter 4

Investigating the Options

In order to understand the problem domain and the options, the first thing undertaken was a series of interviews with people who have insight in the current phone architecture and the future development of the cellular phones at SEMC. Interviews were a quite natural means of obtaining initial knowledge about the capabilities offered by today’s phone architecture at SEMC, as we had no previous personal knowledge about the internal architecture of their cellular phones. This lack of previous knowledge means that the ideas presented so far in this report will be modified quite a bit. However, it is our opinion that the initial idea presented previously may be of interest, as it presents at least our visions about the project, and this was in fact what earned us the possibility to conduct this master thesis at SEMC. This said, it should be pointed out that many of the ideas presented in the initial proposal will be possible to implement using the technology we finally decided to use. In the rest of this section the main focus will be on the options offered by the SEMC architecture, i.e., which parts of the architecture that can be used in order to implement a solution that fulfills the vision for this master thesis.

4.1 Interview Methodology

We had no previous knowledge of what was offered by the SEMC architecture at all, and this influenced the way the interviews were conducted quite a bit. This fact made us decide to use an iterative interview process to investigate the options offered by the architecture. This means that the first interviews were conducted with SEMC personnel, whom had a fairly good system overview, but did not posses detailed knowledge about all parts of the system. These initial interviews gave us the needed initial knowledge of the architecture. After having gained initial understanding of which parts of the architecture that could be of interest, the interviews entered a new phase. As the architecture is quite complex, this phase more or less lasted throughout the entire project. The interviews in this new phase had the goal to get in-depth knowledge about different capabilities offered by the architecture. Because of this reason, the interviews were conducted with different persons, depending on who would be most likely to have the needed information. As some parts of the needed architecture is developed abroad, some interviews were conducted using telephone conferences, or when people from the concerned sections were visiting.

4.2 Interview Results

After conducting the initial interviews it became apparent that the solution for implementing IP-telephony in SEMC’s cellular phones was to be closely connected to SEMC’s IP Multimedia Subsystem (IMS) architecture. In fact, it became quite clear that this was the best, and maybe only option, if we were to implement a working IP-telephony prototype within the time frame for the master thesis project.

(14)

4.3 Investigating the Current Architecture

Even though the indications from the initial interviews were quite synonymous, i.e., IMS was the way to go, we still decided to look into the phone architecture first hand. The reason for doing so was two-folded, one reason was to investigate the options, and the other reason was to familiarize ourselves with the phone architecture. This insight knowledge was also used to direct the interview process and questions in its next phases.

This investigation proved to be quite valuable for two reasons. First and foremost we learned how applications in a cellular phone is generally designed and implemented. This may seem trivial, but the truth is that the internal architecture of a phone differs quite a bit from what is seen as normal application development. In a Windows based environment, for instance, one does not really need to care about process registration and process intercommunication in the same way as in an embedded system.

The other reason was that we became certain that IMS really was the only option, i.e., with the time frame in mind. This became clear as the architectural investigations found no good support for redirecting and managing voice calls in a packet-switched manner. The reason for this was that there simply was no design support in the current base architecture for manipulating, or even getting hold of, audio streams in a satisfying manner. The investigations also showed that there were no good enough native support for media protocols, which could be used for transporting media data over IP-connections.

These facts meant that if we were to implement a solution with only the support found in the current base architecture, we would have to first of all make modifications to the current architec-ture, and secondly develop, or at least implement, a whole new protocol stack. As this would have shifted the attention away from the initial goals, and would have taken too long to actually realize, the focus from now on were to make further investigations of IMS and the capabilities offered by the SEMC IMS architecture.

4.4 IP Multimedia Subsystem

IP Multimedia Subsystem (IMS) is a term used for merging Third Generation (3G) mobile cellular networks with the Internet [2]. IMS is in fact one of the first steps away from the tra-ditional circuit-switched domain. Although there have been data and Internet capabilities in the circuit-switched networks, like PSTN and the mobile 2G networks, these networks are optimized for handling voice transmissions, and only offer custom data capabilities by the use of a modem. IMS, on the other hand, follows the current trend, and makes use of the packet-switched capa-bilities in the third generation networks [2]. It should be noted that it is not the IMS that brings packet-switched capabilities to the phone, as this is a feature of the third generation network. The IMS is rather a term used for a system managing QoS, billing, and mobility aspects that is needed in addition to the packet-switched capabilities of the third generation network, in order to make it appealing to for both network operators and end users. In short, IMS is a system to make use of the IP-protocol in a mobile network.

4.4.1 The SEMC IMS Architecture

(15)

Session Description Protocol. In the SEMC IMS architecture there will also be support for the Session Description Protocol (SDP) [4], which is used in combination with SIP. SDP is actually carried in a SIP message, and is used to describe the actual media that is going to be used after that the session has actually been established with the help of the SIP signaling. For more information about SDP please look at appendix B.

(16)

Chapter 5

Design of the VoIP Prototype

This chapter describes the design of the VoIP prototype that needs to be created. First there will be a description of how the protocols investigated in the pre-study (appendix A, B, and C) can be used in order to fulfill the goals for this project. After this there will be a detailed description of the VoIP prototype and its relation to the SEMC architecture. In order to illuminate the design, a set of scenarios showing the interaction between the different parts (VoIP UI, VoIP-server, IMS SL, etc.) are described in the last section of this chapter.

5.1 Solution Design

As could be seen by the initial investigation, there were some architectural restrictions that narrowed the options for implementing a working IP-telephony prototype within the given time frame. As a result, the focus shifted towards making use of the capabilities offered to us by the SEMC IMS architecture, i.e., SIP, SDP, and RTP. This said, it is however our opinion that the capabilities offered by these protocols are really powerful and would be one of the best solutions for the prototype implementation, even if there would have been other options to consider.

One of the goals of this project was to investigate and make use of the possibilities offered by the SEMC architecture. The main option offered, is to use of the IMS architecture. The other goals were to have a solution that was flexible and could easily be adopted to make use of new communication technologies. The chosen solution should furthermore be able to use Bluetooth as the communication interface. These are all capabilities offered by the initial idea, which is not very strange as the initial idea proposal was constructed to really stress these capabilities.

In the remainder of this chapter there will be a presentation of the possibilities offered by SIP, SDP and RTP, and how these protocols can be used to fulfill the goals of this project. The solutions presented will be put in contrast to what was proposed by the initial idea. This is done in order to show that a solution, which is based on the IMS capabilities, can really fulfill the goals for this project, and to some extent even surpass the visions we had for this project.

When reading this chapter it is assumed that the reader is familiar with the capabilities offered by SIP, SDP and the RTP protocols. The needed background information can be obtained by reading appendix A, B, and C.

5.1.1 Maintaining Flexibility and Modularity using SIP

As will be shown in this section, it is fully possible to maintain the modularity concept presented in the initial idea, by the use of SIP. In fact almost all aspects of the base unit presented in the initial idea, can be constructed by the facilities provided by a normal SIP solution. The main difference from the initial idea would be that instead of having one central base unit with many different capabilities, there would in a SIP solution be a ¸Svirtual ˇT base unit, with the same capabilities, but these would be distributed among the different servers found in a normal SIP network, i.e., registrar, proxy and gateway servers.

(17)

In short, by using SIP, there will be the possibility to add new technologies by adding a new type of gateway to the network. In fact, the SIP solution allows for the separation of the different servers and gateways in a network, and thus there is much better load balancing, reliability and flexibility than was actually the case with the initial idea.

5.1.2 Using SIP and SDP for Negotiating the Media Format

Instead of using a fixed intermediate format for communication between the user interface and the base unit as described in the initial idea, and then translate this intermediate format into the bearer specific media format and protocol, one could with a SIP/SDP solution simply skip this translation, as SDP and SIP allows for communication and negotiation of which media format and protocol to use. This is done by the parties of the call telling each other their capabilities and matching these. This means that when communicating there is no need for intermediate processing of the media format or protocol, as in the initial idea. This is of course only true if the recipient is also connected to a technology capable of handling SIP and SDP. If, e.g., the recipient is using PSTN, the actual SIP and SDP communication takes place between the user interface (in this case a cellular phone) and the PSTN-gateway, and the gateway handles the conversion between SIP/SDP and its negotiated format to and from the PSTN.

5.1.3 Bluetooth with IP Capabilities

SIP is an IP-based protocol. This means that in order to have direct communication between the cellular phone and the recipient using SIP, there is also a need to have an IP-connection be-tween the cellular phone and the rest of the network. This requirement makes it quite obvious that the best protocol, or Bluetooth profile, to use would be one that allows for normal IP-based communication over Bluetooth, i.e., a Bluetooth connection which tunnels IP-based traffic. In the IMS-based solution we decided on using the Bluetooth Network Access Profile (NAP) in order to provide the needed connection.

5.1.4 Overview of the SIP Solution

As can be seen in figure 5.1, the entity depicted as the base unit in the initial idea, is now represented by several network connected servers. It should however be noted that the solution still offers the same possibilities as the initial idea. There is for instance still the option to initiate a call to and from different communication technologies, through the use of gateways. The fact that communication bridging between technologies are done through the use of special purpose gateways servers, actually have some benefits that did not exist in our initial idea. First and fore-most there will be even greater flexibility for new technologies, as there is actually no requirement to add a gateway for the new technology in one’s own domain (or base unit). That is, the only requirement is that the service is offered by someone connected to the Internet, and that access to this service is allowed. Another benefit is that this enables better load balancing than was offered by the initial solution.

(18)

Figure 5.1: Overview of the SIP solution

5.2 Prototype Design and IMS Relationship

The VoIP prototype is a client-server based solution, i.e., there is one application running as a server, the VoIP-server. The client, or user interface, interacts with the VoIP-server to get infor-mation about incoming calls as well as to initiate calls. It is the VoIP-server that in turn interacts and uses the SIP capabilities offered by the IMS Service Layer (SL).

AAs time was limited and as the purpose was to create a prototype rather than a finished product, the main focus was on the VoIP-server. This means that no great effort was taken to implement a neat user interface. The VoIP-server, however, offers support for a client, and there should thus be little work integrating a user interface at a later stage.

This section will describe the internal structure of the VoIP prototype. First, there will be a description of the IMS architecture and how it generally interacts with its clients and vice versa. This is needed in order to better understand the other design descriptions in this section

5.2.1 SEMC IMS Client Interaction

The SEMC IMS architecture parts that are of interest for the VoIP-solution can be split into two categories: the IMS SL (service layer) and RTP. The IMS SL is the part of the underlying architecture that supplies the VoIP server with support for handling SIP sessions. This means that it is quite easy to make SIP requests like register and invite; the only thing needed is to set the SIP-specific parameters and call the specific functionality in the IMS SL. In the same easy manner, by implementing the IMS SL callback interfaces, the VoIP server will be notified by the IMS SL when incoming SIP requests, like invites and byes, are received and will therefore be able to act accordingly.

(19)

5.2.2 IMS SL and the VoIP Server

The VoIP-server can be split up into two parts: the VoIPCore, which is the actual running application, and the VoIPMediaHandler, which handles the media sessions. The VoIP server uses the IMS SL for all SIP requests and responses. As said, the IMS SL also helps the overlying application to setup the negotiated media session. In figure 5.2 can be seen that the VoIPCore component uses the IMS SL to handle SIP requests. Incoming SIP requests are received by the VoIPCore as events sent by the IMS SL.

Figure 5.2: Interaction between the VoIP Server and the SEMC IMS Architecture

Figure 5.2 also shows that the IMS SL uses the VoIPMediaHandler component. This is done using the IMS specific interfaces implemented by the VoIPMediaHandler. The VoIPMediaHan-dler’s responsibility is to set up the actual media sessions. This is done by using other parts of the IMS architecture, mainly the RTP and CStreamingMedia. Once the connections between the two peers have been established using RTP, it becomes the VoIPMediaHandler’s responsibility to make sure that data is being recorded and sent as well as received and played.

The actual recording and playback of data is done by using the StreamingMedia component. This is a component that allows for recording and playback to and from a memory buffer, which is really a must for this solution. The StreamingMedia component also supports full duplex audio, i.e., simultaneous recording and playback. This will however prove to not be completely true, but more about this in the implementation chapter.

5.2.3 The VoIPCore Component

This component is the part of the VoIP-server solution that is the actual running server applica-tion, i.e. it is this component that a user of the VoIP-server, i.e., a VoIP-client (GUI), uses to make outgoing calls and to receive incoming calls. Therefore, a public interface called IVoIPCore was created, which defines the functionality needed by a client, e.g., registering with a SIP registrar or ending a VoIP-call. All of the methods defined by the IVoIPCore interface are asynchronous, which means that in order for the client to know what happened with their request (function call) a public callback interface is needed. Another fact of why the callback interface is needed is that the VoIPCore component must notify the client when incoming calls are received. The VoIP callback interface is further explained in section 5.2.5.

(20)

SL. Figure 5.3 shows what interfaces the VoIPCore component implements and also some of its methods.

Figure 5.3: The main functionality of the VoIPCore component

5.2.4 The VoIPMediaHandler Component

The VoIPMediaHandler component handles the media sessions. This means that it is respon-sible for sending and receiving the voice data that is transmitted between the peers using the Real-time Transport Protocol. To do this, the VoIPMediaHandler uses a utility component offered by the SEMC IMS, called RTP.

Besides making sure that data is sent and received, the VoIPMediaHandler component also has the responsibility of recording as well as playing this data. This is accomplished using the StreamingMedia component, which is able to record as well as to play streaming media.

In order for the VoIPMediaHandler to be able to do all this, it first needs to be informed by the VoIPCore that a new session is about to start. Therefore, the VoIPMediaHandler component implements the IVoIPMedia interface. Using the functionality provided the IVoIPMedia interface, the VoIPCore component can allocate (and deallocate when that is needed) resources that are needed before the media session is started. The design of the VoIPMediaHandler interface along with the functionality that should be offered by implementing the IVoIPMedia interface can be seen in figure 5.4.

Figure 5.4: The main functionality of the VoIPMediaHandler Component

5.2.5 The VoIP Callback Interface

In order to notify a client using the VoIP-server about ongoing SIP requests, as well as about incoming SIP requests, the client needs to implement the ICBVoIP interface. This is because of the fact that the functionality that the VoIP-server offers the client is asynchronous. The need for this is quite obvious, the client UI should not be locked while it is waiting for a specific function to complete. Therefore, the results of such an operation are provided using a callback interface, in this case ICBVoIP. The functionality that the ICBVoIP offers can be seen in figure 5.5.

(21)

5.3 Scenarios

This section will show, with help of scenarios, how the VoIP-server interacts with the rest of the system, and vice versa, in its most crucial parts. Each scenario contains a sequence diagram and a descriptive text explaining the scenario.

5.3.1 Registering with a SIP Registrar

In order to send and receive invites (make a call and receive a call) it is necessary to first have registered with a SIP server. Figure 5.6 is a sequence diagram of the register scenario.

1. When the register method in the VoIPCore component is called, it sets up the register pa-rameters needed for a successful SIP registration.

2. After this setup has been complete, the register method is called, and upon a response from the SIP server (or some other network error) a response code is received. The user of the VoIPCore component is notified with a callback method.

Figure 5.6: The VoIPCore component uses the IMS SL to perform a SIP registration

5.3.2 Sending a SIP Invite Request

Having registered, it should be possible to send and receive invitations to media sessions via SIP. This section describes what happens when a SIP invite is sent to another user that accepts the invitation.

Figure 5.7: The VoIPCore component uses the IMS SL to initilize a SIP invite request 1. When the Invite method is called in the VoIPCore component, it sets up the invite

parame-ters needed for a SIP invite request.

(22)

Figure 5.8: The IMS SL uses the VoIPMediaHandler component to create the SIP invite and to setup the media streams

3. The IMS SL uses implemented functionality in the VoIPMediaHandler component to both create the SDP part of the SIP invite (GetSupportedMedia), as well as to prepare the to-be media session by creating and opening sockets (OpenMediaSockets).

4. After the invite has been sent and a response has been received from the remote end, the IMS SL uses the VoIPMediaHandler to figure out which media sessions that matched (Com-pareMedia). Using that information, the IMS SL closes the sockets that will not be used (CloseMediaSockets), and completes the setup of the media session sockets (SetConnec-tionInfo).

5. The IMS SL notifies the VoIPCore component about the status of the sent SIP invite request, and the status is forwarded to the user of the VoIPCore component.

5.3.3 Starting the Media Session

After an invite has been sent (or received) and it has been accepted, all of the pre-conditions are set (i.e,. the correct sockets for sending and receiving data have been set-up) to finally start having a conversation. A normal phone call using either a cellular phone, a standard PSTN-connect phone, or a VoIP-phone, are usually full-duplex, i.e., it is possible for both participants to talk at the same time. Because of the current limitations in the architecture mentioned in chapter 6, we have been forced to half-duplex conversations, i.e., only one participant may talk at the same time.

Figure 5.9: Preparing the VoIPMediaHandler for the actual media session

1. When an invite-process has been successfully completed, the VoIPCore component calls the StartSession method in the VoIPMediaHandler in order to get it ready to either start listening or talking.

(23)

5.3.4 Requesting to Talk

When the user wants to say something to the other participant, he must make a talk "request".

Figure 5.10: Interaction between the different components when requesting to talk 1. Once the request talk has been received by the VoIPMediaHandler component, it requests

an audio channel used for recording.

2. When the request has been approved (happens immediately unless some other part uses that channel) and thus opened, the recorder is configured.

3. Once a successful configuration of the recording has been completed, a message represent-ing a request-talk is sent to the remote end.

4. When an ack from the remote end is received, the recorder is started and the VoIPCore component’s user is notified.

5. Every time there is new data available to send to the remote end, an RTP-packet is created and sent. This happen frequently until a request talk is received from the remote end, signaling that it is time to start listening instead (see the incoming request talk scenario).

5.3.5 Incoming Request Talk

(24)

Figure 5.11: Interaction between the different components when a "request talk" is received 1. When a request-talk message is received the current recording is stopped (if there is a

cur-rent recording) and an audiochannel used for playback is requested.

2. Once the request has been approved (happens immediately unless some other part uses that channel) and thus opened, the player is configured.

3. Upon a successful configuration of the playback has been completed, a message represent-ing an ack is sent to the remote end.

4. When the first data packet (RTP) arrives, a buffer holding temporary RTP packets is created. The data from the packet is unpacked and sent to the player for playback.

5. Every time that a new RTP packet is received it is put in the buffer holding the temporary packets.

6. Whenever the player runs out of data, the next packet is retrieved from the buffer holding the temporary packets, unpacked, and sent to the player.

5.3.6 Incoming SIP Invite Request

(25)

Figure 5.12: The interaction between the IMS SL and the VoIP-server when a SIP invite is re-ceived

1. When an incoming SIP invite request is received by the underlying architecture it notifies the VoIPCore component, which in turn notifies its user.

2. Should the user accept the incoming invite, this is forwarded to the underlying architecture, which sets up the media session sockets in a manner very much alike the one shown in the Invite scenario above. Once this is completed, the StartSession method is called in the VoIPMediaHandler (see Start media session above), and the VoIPCore component’s user is notified with the results.

3. If the user chooses to reject the incoming SIP invite, this is merely forwarded to the un-derlying architecture, which notifies the VoIPCore component when it is completed. This result is forwarded to the user of the VoIPCore.

5.3.7 Sending a SIP Bye Request

Whenever the user feels that the conversation is over, he may terminate the media session. There is also a possibility that the remote user terminates the conversation, but that is covered in the next scenario (Incoming bye request).

Figure 5.13: The interaction between the VoIP-server and the IMS SL when sending a SIP bye 1. When the VoIPCore component receives a terminate request from its user, it simply

for-wards this request to the underlying architecture.

2. The IMS SL makes sure that all the media specific sockets are closed by calling imple-mented functionality in the VoIPMediaHandler component.

(26)

5.3.8 Incoming Bye Request

This scenario describes what happens when a SIP bye request is received from the remote end.

Figure 5.14: The interaction between the VoIP-server and the IMS SL when a SIP bye is received 1. When an incoming SIP bye request destined for VoIPCore is received, the StopSession method in the VoIPMediaHandler is called in order de-allocate resources and the VoIPCore component’s user is notified.

(27)

Chapter 6

Prototype Implementation

In this chapter there will be a brief presentation of what was implemented in order to make a working prototype. The aim of this chapter is simply to give a slight insight on some of the more important things that had to be implemented in order to make the prototype reality. Focus will thus be on the most important aspects and issues that were encountered during the implementation of the prototype.

6.1 Bluetooth Connectivity

One of the first things done, after having realized that the IMS would be one of the key factors for making our VoIP prototype a reality, was to look into what requirements the IMS had on the data connection it should use. This was especially important as one of the goals of the master thesis was to see if Bluetooth would suffice as data carrier for the chosen solution.

The investigations of what was needed by the IMS, in order to use a certain data carrier, soon revealed that it could handle any type of normal data accounts, like GSM/UMTS based packet-switched and circuit-switched accounts. However, as was indicated during the investiga-tive interviews, there proved to be no current support for handling and managing Bluetooth based accounts. This obstacle was remedied by implementing a module that created Bluetooth based accounts for every paired device in the vicinity that provide a service for network access. After having managed to create the accounts, the focus shifted towards manipulating the "connection manager", which is a module used for setting up the connection described by the data accounts. When this module had been altered to support Bluetooth accounts, there were no longer any ob-stacles for using a Bluetooth connection in the same way as any other connection. It was possible for the IMS to use it as well as for any other service on the phone, e.g., the web browser.

6.2 The VoIP Prototype

As have been seen in the design chapter the actual VoIP solution is implemented as two major blocks, i.e., the VoIPCore and the VoIPMediahandler. As could also be seen in the design, these parts interact with the SEMC IMS in order to get things done. This section describes the issues reveled during the implementation of these components.

6.2.1 Changes in the Underlying Architecture

(28)

6.2.2 No Support for Full-duplex Audio

Another thing that was revealed during the implementation was that there was no actual support for full-duplex audio in the base platform. This meant that it would only be possible to either record or playback audio, but not both at the same time. The fact that this lack resided in the base architecture of the platform meant that there were very little to do about it, as the base platform is developed by a third-party company. As the goal for this thesis was to investigate and develop a prototype to prove the possibilities for supporting new communication technologies with the cellular phone as the interface, this was obviously a major drawback as it limits the scope to only half-duplex solutions.

However, it was our opinion, after having implemented large parts of the VoIP-solution, that when this lack in the architecture is removed there will be no problem handling full-duplex audio conversations. In order to temporarily avoid the problem, and still be able to provide some form of proof that a VoIP-solution with the cellular phone as the interface will still be possible, we shifted towards a half-duplex solution.

We decided that the simplest way would be to pass an application specific token between the recipient and the caller, using the RTCP-protocol. This solution was chosen as there is good support for this kind of token passing through the use of RTCP. In fact, the RTCP-protocol already provides the possibility to create and pass application specific data with different subtypes.

(29)

Chapter 7

Evaluation of the Prototype

In this chapter we will look back at the initial research goals and see what was actually con-cluded. During the evaluation of the possibilities for IP-telephony in SEMC cellular phones, we have also come across a topic that we feel might need further investigation. This topic will also be presented below.

7.1 Answers to the Research Questions

In this section there will be a presentation of the answers to the research questions. As will be seen most of the results have been tested and answered with the help of the VoIP-prototype developed, while others questions have been logically derived, i.e., from the capabilities offered by SIP and related concepts already available on the market.

7.1.1 Reasonable Response Times

Question: Will Bluetooth be able to handle the communication between the cellular phone and the base unit in accordance to what is seen as "normal" response times and quality in traditional telephony?

Answer: With help of the prototype the Bluetooth connection has been empirically to see whether it provides acceptable latency for voice communication. However as have been stated before our current prototype only operates with half-duplex audio, i.e., audio in only one direc-tion at a time. This means that the studies do not actually test if Bluetooth is able to handle real VoIP communication. To provide a likely answer to the question we refer to what is normally seen as acceptable latencies when dealing with real-time voice communication. The general opinion is that latencies below 400 ms will be acceptable for the parties of a conversation, however latency below 150 ms is recommended [6].

The Bluetooth connection has been empirically tested in respect to this criterion. This was done by measuring the one way latency between the cellular phone and the PC providing the IP-connection, i.e., the latency imposed on the data when traveling over the actual Bluetooth connection.

The results presented below are from tests with two different packet-sizes. These were chosen in respect to the real sizes of the data packages traversing the link during normal conversation, i.e., best and worst quality when using the codec in question.

(30)

Packet Size (bytes) Distance (m) Average (ms) Max. (ms) Min. (ms) 194 1 36 65 17 424 1 40 61 35 Diff. - -4 4 -18 194 4 34 61 19 424 4 40 63 32 Diff. - -6 -2 -13 194 8 33 62 18 424 8 39 62 24 Diff. - -6 0 -6 194 12 37 75 22 424 12 44 72 32 Diff. - -7 3 -10

Table 7.1: Bluetooth latency measurement results

As can be concluded from table 7.1, there seems to be little impact when examining factors like packet size and distance. This is however just true up to a certain point, but what is actually showed is that the Bluetooth connection in itself should be able to handle voice communication, even for the larger packets, in a "normal" open office environment.

7.1.2 Possible to Implement IP-Telephony

Question: Is it possible to integrate IP-telephony support into a cellular phone based on the Sony Ericsson architecture?

Is it possible to use any pre-existing techniques from the Sony Ericsson mobile phone architecture in order to ease the implementation?

Answer: As have been described in this report there is evolving support for implementing so-lutions like VoIP in the SEMC architecture. This is mainly because of the features of the SEMC IMS. However, it is at today’s date not possible to implement a fully working VoIP-solution, due to the lack of support for full-duplex audio in the base architecture, i.e., the architecture on which the current phones are built. This is however just a temporary problem and as soon as it have been remedied there will be little work to actually convert the current solution to actually work as a true full-duplex audio VoIP-prototype.

7.1.3 Support for New Communication Technologies

Question: Is it possible to integrate support for more communication technologies based on the selected communication protocols and the Sony Ericsson mobile phone architecture?

Answer: Regarding the support for new communication technologies, we have already seen that it is possible to support new technologies through the use of media gateways. Even if the support is not directly part of the SEMC architecture, it is however a support that comes from the fact that the SEMC architecture supports the Session Initiation Protocol (SIP). The fact that SIP supports new communication technologies through the use of gateways, means that the support is separated from the internal architecture, and this leads to some nice features like extended flexibility, load balancing etc.

7.2 Suggestions for Further Research

(31)

(32)

Chapter 8

Discussion and Related Work

When conducting this master thesis we have naturally had a general interest in what is happen-ing in different areas related to VoIP. What has been noticed is that the general interest in VoIP has more or less exploded during the last years (2004-2005). The fact that the general public becomes more and more interested in using this new technology means that there is also an increased focus on the strengths and weaknesses of VoIP. In this chapter there will be a short presentation of the issues that we have found the most interesting. The topics have been selected in order to address aspects that are important to both developers and the general public.

8.1 Network Address Translation

One major problem that one faces when dealing with SIP based VoIP are NAT situations. NAT is short for Network Address Translation, and is a method used for mapping IP addresses with different address scopes [7]. This often means translating between a public globally recognized IP-address and an IP-address residing within a private network. One of the main reasons for em-ploying NAT is the fact that there actually is a shortage of public global IP-addresses on Internet. This has lead to the creation of private network with own private IP-addresses where NAT is used to allow computers on the private networks to access Internet [8]. To make things worse there exist different types of NATs [9], this means that there can be different behavioral patterns from one situation to another.

The general idea, although simplified, is that when an entity on the private network whishes to exchange information with an entity residing on another network, the outgoing request is passed through the NAT. The NAT then creates a session for the outgoing request. It also changes the address and port field in the outgoing request from the internal IP address and source port of the initiating entity, to the public address of the NAT entity, the port will also be redefined by the NAT. This means that the recipient will perceive the incoming request as if originated from the NAT entity. The answer to the incoming request will therefore be sent to the address of NAT entity, with the destination port that was previously set by the NAT in the outgoing request. When the NAT entity receives the answer on the specified port from the address of destination it remembers that it assigned that session (port) to a request belonging to a certain entity on the private network and can thus forward the answer to the intended recipient. For more extensive information on NAT see [10].

8.1.1 VoIP in NAT Situations

(33)

8.1.2 Avoiding the NAT Problem

There exist different types of NATs. This is one of the reasons that makes the NAT issue, or rather issues, even harder to solve, as a solution which is fully functional in one situation might be inadequate in another. Because of this reason there exist different types of solutions, all with their own benefits and drawbacks. Some solutions can handle every type of NAT, but this comes at the expense of complexity.

Common solutions for handling the VoIP NAT problem are application layer aware firewalls and NATs, MIDCOM, TURN and STUN [8].

MIDCOM is an architecture for controlling and modifying firewalls and NATs from a trusted MIDCOM agent [8]. This of course means that this must be supported by the firewalls and NATs. In a MIDCOM architecture these entities are referred to as middleboxes. In short a SIP client residing inside a NAT, should also implement an MIDCOM agent. This agent should thus be allowed to modify the settings and port forwarding of the NAT (middlebox), i.e., providing it is trusted by the middlebox [11]. This means that the actual address information written in the SIP and SDP messages are provided by the MIDCOM agent, and should thus be valid.

STUN and TURN approaches the problem in a slightly different manner. Instead of actually trying to control the NAT, they try to use the properties of NAT to avoid the problem. The general idea is that there is a STUN or TURN server residing on the public network. The SIP client then exchanges information with this server in order to find out which public IP and port it should write in the SIP and SDP. The exact configuration and information exchange between the SIP client and STUN or TURN server varies dependent on the type of NAT being used.

Which solution to use depend much on which scale one is operating in, and which NAT situation one is trying to solve. For more information about the NAT issue and proposed solutions please see [12].

8.2 VoIP Security

As VoIP solutions are becoming more widely spread, the concern for security has also become more apparent. The main concern is eavesdropping, which means that someone might listen to your call [13]. In VoIP, this listening might be more that just listening to the actual voice data, it could also mean that someone is picking up on the metadata being sent in order to actually set up the call. This data can then be used for denial of service attacks, unwanted advertising, and hijacking of services as it contains system specific information about ports and capabilities [13, 14].

(34)

8.3 Public Safety

An issue that has gained more attention as VoIP solutions has become more widely used is the public safety issue. This issue arises when using VoIP in emergency situations. The basis for being able to use VoIP at all in an emergency situation is that the VoIP service provider offers some form of emergency handling. This handling could be more or less advanced, i.e., the service provider could offer a special emergency solution or just put you through to the emergency service by using PSTN bridging. The problem with forwarding emergency calls using PSTN bridging is that this may actual confuse, as the number being received by the emergency services is the phone number of the PSTN gateway. This is troublesome as emergency services use the caller’s phone number to find out the geographical position of the caller. This is something that works when a call really originates from a real PSTN phone, but in the case of the call originating from a VoIP phone, this may lead to that the emergency response is directed to the address of the PSTN gateway and not to the actual caller’s address [16].

Another cause for concern, when it comes to using VoIP for placing critical calls, is the fact that one can not expect the same quality of service from VoIP as from PSTN, as Internet, over which the call is placed, is a best effort network. Even though the quality of service, when it comes to VoIP, is getting better all the time, it is something that must be considered. As VoIP uses normal computer networks for placing calls there is also an extended risk for not being able to place an emergency call in case of power outs [17]. To understand the severity of a power out one could just look at a normal power out situation. In case of power failure in a normal family home, all computer communication will not work as the computer based home network relies on power to function properly, i.e., all equipment, like cable modems, routers, and VoIP boxes, need an external power supply in order to function.

As VoIP becomes more widely used, the public safety issue has also received more focus, and different solutions have been proposed in order to handle the safety issues. The proposed solutions vary in complexity; everything from manually entering your location when signing on the VoIP network [17], to solutions like direct truncating, where emergency calls are automatically routed to public safety answering points [18], have been proposed.

It is however our opinion that the solution presented in this report, where the actual VoIP client is implemented in a cellular phone, offers some good solutions to these issues, as it enables the option to route all emergency calls through the cellular network instead of relying on the capabilities offered by the VoIP network. Although there are opinions that one should not rely on other services to provide emergency handling, as this will slow down the development of VoIP, we still believe that a solution like the one presented in this report will serve a purpose until the VoIP emergency handling have matured and there have been a standard developed for public safety using VoIP.

8.4

Chapter 9

Conclusions

In this report there has been a presentation of the investigative work on the possibilities for implementing VoIP in the Sony Ericsson cellular phones, using the SEMC architecture. The possibilities to support such a solution over Bluetooth have also been investigated.

The investigations in this report have shown that there is partial support for VoIP in the SEMC architecture. In order to have full VoIP-support, the issue of the base architecture only handling half-duplex audio must be addressed. It has also been concluded that the best option for im-plementing a VoIP-solution, in a Sony Ericsson cellular phone, is to use the Session Initiation Protocol (SIP) for call signalling and the Real-time Transport Protocol (RTP) for media stream-ing. The SIP and RTP protocols are supported through the use of the SEMC IMS architecture. It has also been concluded that a SIP and RTP based solution could support other communication technologies like PSTN, through the use of gateways.

(37)

Acknowledgements

First of all we would like to thank everyone at UMTS and GSM Services at Sony Ericsson Mobile Communications AB in Lund, Sweden. Everyone have been very understanding, helpful, and willing to spend time answering questions regarding the SEMC mobile phone architecture and the development environment.

(38)

Bibliography

[1] Gonzalo Camarillo. SIP Demystified. McGraw-Hill Inc., 2002.

[2] Gonzalo Camarillo and Miguel A. Garcia-Martin. The 3G IP Multimedia Subsystem. John Wiley & Sons Ltd., 2004.

[3] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP: Session Initiation Protocol. RFC 3261 (Proposed Standard), June 2002. Updated by RFCs 3265, 3853.

[4] M. Handley and V. Jacobson. SDP: Session Description Protocol. RFC 2327 (Proposed Standard), April 1998. Updated by RFC 3266.

[5] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol for Real-Time Applications. RFC 3550 (Standard), July 2003.

[6] Recommendation ITU-T G.114 – One-Way Transmission Time. International Telecommu-nication Union, 1996.

[7] P. Srisuresh and M. Holdrege. IP Network Address Translator (NAT) Terminology and Considerations. RFC 2663 (Informational), August 1999.

[8] Michael Stukas and Douglas C. Sicker. An Evaluation of VoIP Traversal of Firewalls and NATs within an Enterprise Environment. Information Systems Frontiers, 6(3):219–228, 2004.

[9] V. Paulsamy and S. Chatterjee. Network convergence and the NAT/Firewall problems. Pro-ceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003. [10] P. Srisuresh and K. Egevang. Traditional IP Network Address Translator (Traditional NAT).

RFC 3022 (Informational), January 2001.

[11] Y.Itoh and Y. Fukuda. A study on the applicability of MIDCOM method and a solution to its topology discovery problem. The 9th Asia-Pacific Conference on Communications, 2003, 3:1133–1137, 2003.

[12] G. Camarillo J. Rosenberg. NAT and Firewall Scenarios and Solutions for SIP. Internet Draft, Internet Engineering Task Force, 2003.

[13] Johna Till Johnson. VoIP security concerns cannot be ignored. Network World, 22(31):28, 2005.

[14] Stefano Salsano, Luca Veltri, and Donald Papalilo. SIP Security Issues: The SIP Authenti-cation Procedure and its Processing Load. IEEE Network, 16(6):38–45, 2002.

[15] K. Ono and S. Tachimoto. SIP signaling security for end-to-end communication. The 9th Asia-Pacific Conference on Communications, 3:1042–1046, 2003.

[16] Colleen Boothby. Liability Issues In A VOIP Environment. Business Communications Review, 35(2):43–45, 2005.

[17] Anna Henry. VoIP AND E-911: IS HELP ON THE WAY? Rural Telecommunications, 24(1):14–19, 2005.

(39)

[19] Jenny Levine. Product Pipeline. Library Journal, 130:22–24, 2005.

[20] Wayne Rash. Two IP Phones Worth Picking Up. InfoWorld, 26(4):26, 2004.

[21] Vince Vittore. VoIP-enable CPE market fills with new product entries. Telephony, 245(24):17–18, 2004.

[22] John R. Quain and Marc Silver. Phones that love Wi-Fi. U.S. News and World Report, 137(9):75, 2004.

[23] Bob Brewin. Mobile Phones Move Toward Combined Calling Capabilities. Computerworld, 38(13):6, 2004.

[24] Alan B. Johnston. SIP: The Session Initiation Protocol. Artech House Inc., 2001.

[25] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. Hypertext Transfer Protocol – HTTP/1.1. RFC 2616 (Draft Standard), June 1999. Updated by RFC 2817.

(40)

Appendix A

The Session Initiation Protocol

A.1 Introduction to SIP

The Session Initiation Protocol (SIP) is a signaling protocol that is used to control, i.e. to establish, modify and tear down, multimedia sessions over IP networks [2, 24]. SIP can be used to set up practically any type of session, e.g. audio calls, video conferences, games, etc [1].

The SIP protocol is thus used to send invitations to multimedia sessions, to modify these sessions and ultimately to tear them down. To be able to describe the multimedia session that is to be set-up, there is a need for a description of this session. One of the advantages of SIP is that it has been designed to be independent of the protocol that is used to describe the session, and hence also the actual multimedia session [1]. The most common protocol used to describe multimedia sessions is the Session Description Protocol (SDP) [1, 2, 24]. SDP will be is described in more detail in appendix B.

SIP is designed based on the two most commonly-used and popular protocols, namely HTTP (Hypertext Transfer Protocol, see [25]) and SMTP (Simple Mail Transfer Protocol, see [26]) [2, 24] The design taken from HTTP is the request/response, i.e. client-server, approach [24]. From SMTP the headers, e.g. To, From, and Subject, were re-used [24].

A.2 The Architecture of a SIP Network

There exist different entities in a SIP network, shown in figure A.1. It is important to understand that entities are mere roles, and one physical SIP server may play one or more of these roles, i.e. one SIP server may act as both a registrar and a proxy.

Figure A.1: The different entities in a SIP network

Voice over IP for Sony Ericsson Cellular Phones