Load balancing of IP telephony

(1)

Institutionen för datavetenskap

Department of Computer and Information Science

Final thesis

Load balancing of IP telephony

by

David Montag

LIU-IDA/LITH-EX-A--08/051--SE

2008-11-24

Linköpings universitet

(2)

Institutionen för datavetenskap

Department of Computer and Information Science

Master's Thesis

Load balancing of IP telephony

David Montag

Reg Nr: LIU-IDA/LITH-EX-A08/051SE Linköping 2008

Department of Computer and Information Science Linköpings universitet

(3)

(4)

Institutionen för datavetenskap

Department of Computer and Information Science

Master's Thesis

Load balancing of IP telephony

David Montag

Reg Nr: LIU-IDA/LITH-EX-A08/051SE Linköping 2008

Supervisor: Mikael Johansson

Ingate Systems AB

Examiner: Simin Nadjm-Tehrani

IDA, Linköpings universitet

(5)

(6)

Avdelning, Institution Division, Department

The Real-Time Systems Laboratory, IDA

SE-581 83 Linköping, Sweden

Datum Date 2008-11-24 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version http://urn.kb.se/resolve? urn=urn:nbn:se:liu:diva-16066 ISBN ISRN LIU-IDA/LITH-EX-A08/051SE Serietitel och serienummer

Title of series, numbering ISSN

Titel

Title Lastbalansering av IP-telefoni_{Load balancing of IP telephony}

Författare

Author David Montag

Sammanfattning Abstract

In today's world, more and more phone calls are made over IP. This results in an increasing demand for scalable IP telephony equipment.

Ingate Systems AB produces rewalls specialized in handling IP telephony. They have an inherent limit in the number of concurrent phone calls that they can handle. This can be a bottleneck at high loads. There is a load balancing solution available in the platform, but it has a number of drawbacks, such as media latency and client capability requirements, limiting its usage.

Many companies provide load balancing solutions for SIP. However, it appears few handle all the problematic scenarios that the Ingate rewall does. This mas-ter's thesis aims to add load balancing functionality to the Ingate rewall, so that it can handle all types of clients.

By splitting the rewall into two completely separate layers - a SIP layer and a rewall layer - the concept of a virtual machine emerges. A machine is no longer restricted to its physical SIP and rewall layers. Instead, virtual machines are used to process calls. They still have SIP and rewall layers, but the layers can reside on dierent physical machines.

This thesis demonstrates the operation of an innovative load balancing imple-mentation. The implementation was evaluated, and using four machines the test setup performed 50% better than the original Ingate platform, while still retain-ing all functionality somethretain-ing that was not possible with the original platform. This surpassed both the company's and my own expectations.

Nyckelord

(7)

(8)

Abstract

In today's world, more and more phone calls are made over IP. This results in an increasing demand for scalable IP telephony equipment.

Ingate Systems AB produces rewalls specialized in handling IP telephony. They have an inherent limit in the number of concurrent phone calls that they can handle. This can be a bottleneck at high loads. There is a load balancing solution available in the platform, but it has a number of drawbacks, such as media latency and client capability requirements, limiting its usage.

Many companies provide load balancing solutions for SIP. However, it appears few handle all the problematic scenarios that the Ingate rewall does. This mas-ter's thesis aims to add load balancing functionality to the Ingate rewall, so that it can handle all types of clients.

By splitting the rewall into two completely separate layers - a SIP layer and a rewall layer - the concept of a virtual machine emerges. A machine is no longer restricted to its physical SIP and rewall layers. Instead, virtual machines are used to process calls. They still have SIP and rewall layers, but the layers can reside on dierent physical machines.

This thesis demonstrates the operation of an innovative load balancing imple-mentation. The implementation was evaluated, and using four machines the test setup performed 50% better than the original Ingate platform, while still retain-ing all functionality somethretain-ing that was not possible with the original platform. This surpassed both the company's and my own expectations.

(9)

(10)

Acknowledgements

I would like to start by thanking everyone of the Ingate Systems team. I really feel that I was accepted into the team with warmth. I would also like to extend a special thank you to Robert Högberg and Per Cederqvist, for your much appre-ciated technical and structural input. Without you, I could not have done this. I would also like to thank my supervisors at Ingate, Mikael Johansson and Janne Magnusson, for your excellent management and supervision. I am truly grateful.

I would like to thank my examiner at Linköpings Universitet, Simin Nadjm-Tehrani. She has been very supportive throughout the entire process.

I would also like to thank my opponent, Karl Andersson, for the constructive criticism he provided.

(11)

(12)

Introduction

This report is in partial fulllment of a master's degree in computer science and engineering (30 ECTS points) and was examined at Department of Computer and Information Science at Linköpings universitet.

This chapter will describe the background, the problem formulation, and the requirements for the thesis.

With IP telephony on the rise, more and more corporations choose to invest in IP telephony infrastructure. This means that more calls are being made over IP. As the demand for IP telephony services increases, the workload for IP telephony devices increases.

One of the most commonly used protocols for IP telephony is called the Session Initiation Protocol[22], SIP. It is used for setting up arbitrary sessions between two or more peers. The audio/video media is not sent using SIP, but SIP is used to negotiate which types of media should be sent, and where the media should be sent to. This is done by Session Description Protocol (SDP) payloads in the SIP packets. These negotiate parameters such as addresses and ports to which the media should be sent, and codecs for encoding the media data. SIP proxies relay messages in the network.

So-called SIP-capable rewalls handle all the corner cases of SIP where it does not work on its own. The most common case is when one or more calling parties is behind NAT. NAT stands for Network Address Translation, and is often used for translating a private network's IP addresses to a public one, and vice versa. It will however not translate the IP packet contents, so the SIP signaling will continue to use private addresses. Firewalls that are SIP-capable translate not only the IP addresses, but also the IP packet contents.

Ingate Systems AB is a Swedish company producing SIP-capable rewalls. Today, Ingate's high-end products handle 1,500 concurrent calls. This can be a bottleneck for customers with very high loads. For those customers, Ingate provides a load balanced solution consisting of several rewalls that share the load. The load balancing is however done on the client side. That is, the client chooses a rewall to use based on information that it obtains from DNS, the Domain Name System that, among other things, translates domain names to IP addresses. The

(19)

4 Introduction whole current solution relies on the clients being able to perform the selection of rewall by themselves. Unfortunately, there are clients that are not able to do this.

1.1 Problem formulation

The problem that this thesis aims to solve is to provide support in the Ingate platform for a scalable solution that scales for clients that are not capable of selecting a rewall to use through DNS, but can only be congured statically to use one rewall for all calls. The new solution still has to be SIP-capable, that is, it has to handle all the corner cases of SIP that the current solution does in a single machine setup. Most notably, NAT has to be supported.

1.2 More about SIP

SIP is a signaling protocol, and does thus not carry any audio or video media. Another protocol, the Real-time Transport Protocol (RTP) encapsulates the sent media.

SIP can signal dierent kinds of things. When initiating a call, the caller sends an INVITE request to the callee. The callee then responds with a 200 OK response. The caller then completes the call setup by sending an acknowledgement ACK request. The call is then set up, and the calling parties' phones can start to send RTP media to each other as negotiated in the setup process.

Now, this is the absolute minimum to set up a call. A call signaling protocol can naturally do more elaborate things. For example, it is possible to transfer a call so that a call between parties X and Y becomes a call between parties X and Z instead. The typical scenario for this is a secretary taking calls and forwarding (transfering) them to the sought callee.

In SIP, callers are contacted using so-called SIP URIs. A URI is similar to an e-mail address. When a call is initiated to Bob, an INVITE request for bob@example.com is sent to the IP address that is pointed at by example.com. It is up to a so-called registrar to know where Bob actually resides. It knows this because Bob has registered with the server by sending a REGISTER request to it, informing it of his whereabouts.

Now remember, SIP is a signaling protocol that can be used to set up any kind of session. It can be also be used for sending messages. This is done using MESSAGE requests. SIP can also re-negotiate call parameters by sending an INVITE again. This is called a re-INVITE.

In today's security-aware world, SIP can also be sent in an encrypted connec-tion using Transport Layer Security (TLS).

(20)

1.3 Comments on NAT 5

1.3 Comments on NAT

NAT is a technology that is often used for enabling a private network with many hosts to access the internet while sharing one public IP address. NAT is performed by the gateway (NAT device) that routes trac between the Internet and the private network. When a host on the private network sends an IP packet to the public internet, the NAT device will translate the IP headers, replacing the private source address with the public one. As seen from a host on the Internet, all packets will then appear to have originated from the NAT device.

The trick is to get packets from the Internet back to the correct host on the private network, as all packets to the NAT device have the public address as their destination address. This is done by having the NAT device remember where connections are made from, so that it can forward the returning packets correctly. This information is stored in a so-called NAT binding.

The problem with SIP clients behind NAT that is, clients on private networks that share a public IP address is that when a client registers in SIP, a NAT binding will be created between the client and the registrar. However, if no trac is sent through this binding for a certain period of time, it will expire. The result is that calls cannot be made to the client until it registers again.

1.4 Load balancing done with DNS

The current Ingate solution already supports load balancing with the help of DNS, the Domain Name System. The load balancing is done by using SRV records.

1.5 Requirements

The requirements of this thesis are of two kinds, research requirements and im-plementation requirements. The former involves the research required before de-signing the new system, and it has to be completed rst. The latter lists the requirements that the new implementation must fulll.

1.5.1 Research

The research should provide answers to the following questions:

1. What support for load balancing of SIP and SIP media exists in the Ingate platform today?

2. Study other vendors:

(a) What technology do other vendors have? (b) What type of load balancing do they support? 3. Support for load balancing in SIP today:

(21)

6 Introduction (b) What other studies are there on SIP and load balancing?

4. Ways of load balancing SIP:

(a) What are the benets and drawbacks of each way? Consider: i. Performance and scaling

ii. Functionality provided iii. Ease of administration iv. Security

(b) What forms of redundancy can each way provide? (c) What are the benets and drawbacks of each way?

1.5.2 Implementation

The implementation shall:

1. Be based on Ingate rewalls, version 4.6.2, and run on Ingate rewall hard-ware.

2. Load balance phone calls. What this means is that it should be possible to stack a number of Ingate devices to achieve a higher total number of possible simultaneous calls.

3. Handle SIP over UDP. 4. Handle RTP.

5. Handle the following SIP scenarios:

(a) Regular phone call, including hangup and re-INVITEs for keep-alive. (b) Transfer, both attended and unattended.

(c) REGISTER requests, both when the Ingate is registrar, and when there is an external registrar.

(d) Remote and local NAT traversal. (e) Instant Messaging (MESSAGE). (f) UPDATE requests.

6. Support TLS and TCP in the design. 7. Not rely on DNS.

However, the implementation need NOT: • Support centralized updating.

• Support centralized conguration of the machines. • Implement support for TLS and TCP.

(22)

1.6 Method 7

1.6 Method

Initially, a description of the technologies essential to this thesis will be described. After that, an investigation into Ingate's own platform and on other vendors' products from a load balancing perspective will be done. Other research studies and standards will also be analyzed. Now there is enough information on the technologies involved to come up with a number of designs. Based on these designs, a nal design will be selected. After verifying it conceptually, the design will be implemented and tested. Finally, some conclusions will be drawn.

1.7 Thesis Outline

The thesis is divided into six major chapters. This introductory chapter introduces the problem to be solved. It described the goals to be met, and outlines an overall plan for the whole thesis. The background chapter introduces the reader to all the terms and technologies needed to understand the rest of the thesis. The study results chapter describes the results of the study, and the process of how I reached my design. The design chapter explains the design in detail. The implementation chapter describes some interesting details of the implementation. The results chapter describes the results of the implementation eort, including requirements verication and performance tests.

In the end, appendix A holds the glossary, appendix B includes performance graphs, and appendix C is a condential appendix holding the code produced in this thesis work. It resides at Ingate Systems.

(23)

(24)

Chapter 2

Background

This chapter explains all the background information necessary to understand the rest of this thesis.

2.1 Example scenario

The basic setup for a enabled rewall scenario is shown in Fig. 2.1. The SIP-enabled rewall sits between the Internet and the Local Area Network (LAN). On the Internet, there are two dierent kinds of clients, NAT-connected and directly connected ones. The NAT-connected ones connect to the Internet through a NAT gateway, such as a commodity consumer router.

In the rest of this paper, the term "the system" refers to anything replacing the role of the rewall in Fig. 2.1.

2.2 SIP

All SIP communication is request/response based. This means that the rst mes-sage sent between two parties is a request. The second mesmes-sage is sent in the opposite direction, and is a response to the rst message. For example, when a client registers with a registrar, the client sends a REGISTER request to the reg-istrar. The registrar then replies with a 200 Ok response. The only exception to the rule is the ACK message, which is not followed by any responses.

There are two dierent kinds of responses, nal and provisional ones. A re-quest/response sequence is only completed when a nal response is received. This means that multiple provisional responses can be received before the nal one. For example, when a client wants to call another client it sends an INVITE message to the callee. If a proxy is between the caller and the callee, then it will respond to the caller with a provisional 100 Trying response before forwarding the request. That way, the caller knows that the proxy has received the message.

So how does a client nd another client? This is where the registrar comes into play. It keeps track of the registered clients' whereabouts. As mentioned above,

(25)

10 Background

Figure 2.1. Basic rewall setup

when a client wants to be accessible through the system, it registers with the registrar using a REGISTER request. In the request, the client species where it can actually be contacted. The registrar then saves this information. When it receives an INVITE for the client, it can forward the request to the correct location. All registrations have an expiry time, a timeout. When this time has elapsed, the registration is removed. Therefore, the client has to keep registering at periodic intervals. A feature that comes with this is that the registrar, in the 200 Ok response to REGISTER request, can advertise a lower register timeout. The client should then use that timeout instead.

SIP can also do call transfers, as in the typical scenario with the secretary taking calls and forwarding (transfering) them to the sought callee. There are however two kinds of transfers - attended and unattended. Using the secretary scenario, an attended transfer of a caller would involve the secretary putting the caller on hold and calling the sought callee herself. After verbally verifying that the callee wants to speak with the caller, she connects the two in an attended transfer. She can then hang up the phone, and the two parties can talk to each other. In an unattended transfer, the secretary would not call the callee herself rst, but connect the caller with the callee immediately, and then hang up. The callee's phone will then ring, and when answered, be connected with the caller.

The standard TCP/UDP port for SIP is 5060. SIP is however not dependent on any particular transport protocol. Generally, SIP registrars and proxies listen on port 5060, while SIP messages from clients can use any source port.

(26)

2.3 NAT 11 SIP messages can have a number of headers and an arbitrary payload (body). Often, the SIP message carries an SDP payload. SDP (Session Description Proto-col) describes what kind of data a session should use, where to send it, and which codecs to encode the data with. In a phone call, the SDP, among other things, contains the IP address and UDP port to send the voice media to.

When establishing a SIP phone call, the parties exchange SDP payloads. The caller starts by sending an INVITE request to the callee. The callee then replies with a 200 Ok response to the caller. The callee nishes the setup by sending an ACK request to the callee. Now, SDP payloads need to be exchanged in both direction. This can be done in two ways either in the INVITE/200 Ok messages, or in the 200 Ok/ACK messages.

Requests and responses in SIP are always part of a bigger transaction, a dialog. In the case of register requests, the dialog only spans the register request and the 200 Ok response. However, in the case of a call, the dialog spans the entire call, including all messages related to the call. A dialog is identied by a Call-ID header eld in the SIP message. All requests and responses related to a certain call will have the same Call-ID value. Every dialog must have a globally and temporally unique Call-ID.

SIP messages can be routed between SIP proxies, much like phone calls are routed between telephone switches. When a SIP proxy forwards a request, it adds a Via header eld with its own address to the request. When the response is sent back, all Via header elds are copied to the response. The response is then sent back to the host in the last Via header eld. When the proxy receives the response, it removes its added Via header eld. This way, the response is sent back the exact same way as the request.

Dierent SIP requests within the same dialog do not have to take the same path. A SIP proxy can opt to stay in the message path for future requests and responses. This is done by having the SIP proxy insert a Record-Route header eld in the request. The receiver of the request will then copy these header elds to the response. They will then travel all the way back to the sender of the request. For the rest of the requests in the dialog, the sender will insert a Route header eld for each Record-Route header eld. The Route header elds then dictate the path to the receiver of the request.

2.3 NAT

NAT a networking technology used for translating addresses hence the name Net-work Address Translation. NetNet-work devices capable of NAT (NAT devices) can map and translate addresses in a number of ways. An example is a 1:1 mapping (also known as full cone NAT), where 1.2.3.x is translated to 4.5.6.x. All com-modity home routers do N:1 mapping (also known as port-restricted cone NAT), where clients on an internal network share a single public address. The clients on the internal network are said to be behind NAT.

When a host on the internal network (an internal host) sends a TCP/UDP packet to an external host on the Internet, a NAT binding is created, mapping

(27)

12 Background the TCP/UDP port and external host to the internal host. When trac is sent from the external host to the internal host on a given port, the router performing NAT will nd the binding, and forward the packet to the internal host listed in the binding. If there is no binding for a certain external host and port, then that packet is discarded.

If a web server is behind NAT, and it needs to be accessible from the outside, then that poses a problem. Clients making connections to the NAT device on port 80 (standard web server port) will have their requests dropped, as no binding for those packets exists (the web server has not contacted the clients rst). The NAT device can then be congured to forward all connections made to the NAT device on port 80 to the web server.

NAT bindings are transient. Once created, they will exist as long as they are used. If a binding stays unused for a certain amount of time, the binding is removed. If packets that would have matched the binding arrive after the binding has been removed, then they will be dropped.

2.4 NAT problems with SIP

The big problem with SIP is its lack of NAT compatibility. Clients behind NAT cannot use SIP in a proper way without help. Note that this is not really a load balancing related problem, but more a general SIP issue. It needs to be solved in order to make SIP work properly with all clients.

There are three main NAT-related problems, as identied in sipping-nat-scenarios[4]. The good news is that all these problems can be solved:

• Replies sent to wrong port

Problem: SIP responses are not always sent back to the UDP port that the request was sent from. Sometimes the responses are sent to the standard SIP port 5060 instead. If this port isn't forwarded to the SIP device, then the response will be dropped by the NAT device.

Solution: Clients should implement RFC 3581[21]. Responses will then be sent back symmetrically, using the same ports that the requests were sent from. There will then exist a binding for the response packets. • Timeout closes binding

Problem: If no packets have been sent through a binding over a certain period of time, then the binding is closed/removed by the NAT device. Typically, a client registers with a registrar, and inbound messages to the client will be sent through the same binding. If no messages are sent for a while, then the binding closes, and then the client will not be able to receive messages, e.g. incoming calls.

Solution: The client, the server, or both, can employ so-called keep-alive mechanisms that keep the binding open. This is generally done by sending packets at regular intervals. Examples are for the server to send

(28)

2.5 DNS 13 OPTIONS requests at certain intervals, or for the client to re-register often. The current best way is for the client and server to implement the outbound draft[12].

• Private addresses in SDP

Problem: Clients behind NAT will use their private network addresses in the SDP used when setting up calls. This will result in media not being routed properly between the calling parties.

Solution: One way to solve this is to have some device either one of the clients or a proxy/server on the public Internet. The public entity will then ignore the NAT'ed client's SDP info, and instead wait for media to arrive. When it does, it will send media destined for the NAT'ed client to the address and port that it rst came from, thus using the NAT binding the initial media created.

Another option is to have the NAT gateway perform a deeper transla-tion, wherein it translates both the network headers, the SIP headers, and the SDP content. This is what the Ingate rewall does.

Another way to cope with NAT, identied in sipping-nat-scenarios[4], is to use ICE[20]. The purpose of ICE is to allow for oer/answer type protocols to be able to communicate when NAT devices are present in the network. In some scenarios, ICE could be a useful solution. However, there are two issues. The rst is that all clients do not implement ICE. In order to handle all clients, solely using ICE would not work. However, it can be combined with other eorts, such as the ones mentioned above. The second issue is that ICE requires ports to be open in the rewall. In some cases that might not be acceptable.

Performance tests performed by Ingate indicate that the greatest bottleneck, when it comes to the number of simultaneous calls in the Ingate rewall, is the handling and forwarding of media. In other words, the signaling uses less resources than the media.

2.5 DNS

DNS is a hierarchically distributed database. The data elements stored in DNS are called resource records (RRs). To simplify things, RRs have a key, a type and a value. The most commonly used RR, the type A record, maps hostnames to IP addresses. An RR such as example.com. A 172.16.101.2 will map the hostname example.com to the IP address 172.16.101.2.

2.6 Load balancing using DNS

IP telephony load balancing based on DNS relies heavily on the use of two resource record types; the SRV record and the NAPTR record. Basically all SIP clients produced nowadays are SRV and NAPTR capable. There are however older clients that only handle A records. These can actually be load balanced in a round robin

(29)

14 Background fashion using only A records, albeit without the distribution tweaking that SRV oers. Unfortunately, there are also clients that only support a single IP address, and those clients are not load balancable at all with DNS only.

2.6.1 SRV records

SRV records map a certain service to a host. An example record is: _sip._udp.ingate.se. SRV 100 0 5060 sip.ingate.se.

This means that sip.ingate.se is running a SIP service for the domain ingate.se using UDP. The service has a priority of 100, a weight of 0, and uses the UDP port 5060.

A client wanting to communicate using SIP over UDP with a host at the domain ingate.se will perform a lookup of _sip._udp.ingate.se, and will then contact the resulting host.

The advantage of SRV records is that multiple records can be used. For example, the following reply could be possible:

_sip._udp.ingate.se SRV 100 0 5060 sip1.ingate.se. _sip._udp.ingate.se SRV 100 0 5060 sip2.ingate.se. _sip._udp.ingate.se SRV 200 0 5060 sip3.ingate.se.

In this case, a client would use sip1.ingate.se and sip2.ingate.se in a round-robin fashion, as they have the lowest priority number (most important). sip3.ingate.se would be used as a backup if the rst two servers were accessible. This is because it has a higher priority number, making it less favorable.

Within a group of the same priority, the weight is used to balance the load between the servers. Consider the following reply:

_sip._udp.ingate.se SRV 100 50 5060 sip1.ingate.se. _sip._udp.ingate.se SRV 100 25 5060 sip2.ingate.se. _sip._udp.ingate.se SRV 100 25 5060 sip3.ingate.se.

Here, the client would access sip1.ingate.se 50% of the time, and sip2.ingate.se and sip3.ingate.se 25% of the time each.

Clients that can select a server to use based on SRV records are said to be SRV-capable.

2.6.2 NAPTR records

NAPTR records are used for mapping regular phone numbers into custom URIs. The phone numbers are represented in E.164, the international telephone numbering plan format. For example, the number +46-13-210863 would be represented as 3.6.8.0.1.2.3.1.6.4.e164.arpa. A lookup of this could yield:

3.6.8.0.1.2.3.1.6.4.e164.arpa NAPTR 10 10

(30)

2.6 Load balancing using DNS 15 The lookup essentially yields a regular expression. This expression is used to rewrite 3.6.8.0.1.2.3.1.6.4.e164.arpa into sip:user@ingate.com. The numbers mean the same here as they did for SRV records; the rst is the priority and the second is the weight. The "u" ag indicates that we should stop looking. "E2U+SIP" is an identier for the SIP service.

It should be noted that NAPTR records are not only used for rewriting to SIP URIs, but also other URI types:

3.6.8.0.1.2.3.1.6.4.e164.arpa NAPTR 10 10

"u" "E2U+sip" "!^.*$!sip:user@ingate.com!" . 3.6.8.0.1.2.3.1.6.4.e164.arpa NAPTR 20 10

"u" "E2U+email" "!^.*$!mailto:user@ingate.com!" .

This will have the client rst try contacting the user via SIP, and if that is not possible, it will try e-mail.

Generally, after obtaining a SIP URI via NAPTR, the client will try to contact the callee by using the SRV method described above.

(31)

(32)

Chapter 3

Study results

In this chapter, the result of a comparative study of other technologies that could possibly fulll the requirements are described.

3.1 What technology do other vendors have?

I have studied a number of other vendors, and briey documented their SIP capabilities. Most of the information was obtained from their web sites, but some information was gathered via e-mail correspondence.

3.1.1 Radware

Radware develops a product called SIP Director[19]. It is a front-end style load balancer. It can be congured in redundant pairs with a failover peer using Virtual Router Redundancy Protocol (VRRP), a technology used for making network components redundant. They also claim it handles up to 10,000,000 Busy Hour Call Attempts (BHCA), which is a metric used to measure the maximum number of call attempts a telephony component can handle per hour. It supports a range of protocols, including SIPS and TLS. It also supports clients behind NAT by using OPTIONS polling or the CRLF method described in the outbound draft[12].

3.1.2 F5

F5[8] develops a product line called BIG-IP. It consists of various network and application management components. For example, the Application Security Manager protects servers from malicious requests, and the Global Trac Manager redirects clients to other locations when services fail.

Their Local Trac Manager, LTM, they claim, is a layer 7 application delivery controller. It is specialized in securing the network from attacks, and load balancing the applications.

(33)

18 Study results

3.1.3 OpenSER

OpenSER[17] is an open-source fork of FhG FOKUS' SIP Express Router (SER). OpenSER aims to be a robust and scalable SIP server, focusing on performance. The downside is that OpenSER requires a lot of, at times complicated, conguration work, and not just anybody can congure it easily.

3.1.4 Asterisk

Asterisk[3] is a full-edged open-source SIP server, capable of providing almost any kind of SIP services. The downside of Asterisk is that it is not nearly as fast as for example OpenSER. It will have to be run with a load balancer in order to scale.

3.1.5 Foundry

Foundry Networks[9] has a platform called ServerIron. It is an application switch that can maintain sessions on SIP level, so that all packets for a certain SIP session are forwarded to a certain proxy. It is unclear whether it can handle remote clients behind NAT fully (keep the NAT binding open, etc).

3.1.6 PLANET Technology Corporation

PLANET Technology Corporation[18] markets a product called SIP-50. It is a SIP proxy server with automatic NAT detection and stateful RTP proxying. It does however only support 50 calls per second and 50 concurrent calls.

3.1.7 AG Projects

AG Projects[2] is a company mainly providing SIP infrastructure services based on the OpenSER SIP server. The AG Projects platform, they claim, uses DNS for load balancing, as they believe that a load balancer would constitute a single point of failure. Their platform supports NAT'ed clients and scales well and redundantly, according to a representative.

3.1.8 A10 Networks

A10 Networks[1] develops a product line of trac managers called the AX Series. The only information available about their SIP load balancing support is that they have it.

3.2 What support for load balancing exists in SIP today?

According to RFC3261[22], so-called redirect servers may point clients to proxy servers using 3xx class redirection responses (such as 302 Moved temporarily, for example). By letting the redirect servers point clients to dierent proxy servers, a form of load balancing is made. A problem arises with registrations though, and

(34)

3.3 Other studies about SIP and load balancing 19 that is that calls and registrations are made to the same address, but only calls can be redirected. Registration can not.

Another way to load balance SIP is to use SRV and NAPTR records. With this technique, the load balancing takes places on the client side, i.e. the client selects a server to use. See SRV and NAPTR in the glossary (Appendix A) for a more detailed explanation.

3.3 Other studies about SIP and load balancing

I have read three other papers about SIP and load balancing. Not one handled all the complications of load balancing SIP.

3.3.1 Failover, load sharing and server architecture in SIP telephony

This paper [23] by K. Singh and H. Schulzrinne details an approach to creating a so-called two-stage reliable and scalable architecture for SIP. It works by segmenting the servers into two layers. Servers in the rst layer are located by clients through NAPTR and SRV DNS records. Based on a hash, the packets are routed to a cluster in the second layer. The individual machine inside the cluster is selected through another DNS lookup.

There is a problem though no care seems to be taken for NAT. No real mention is made about how media should ow, other than the fact that it necessarily doesn't use the same path as the signaling. It is also unclear whether the rst-stage servers can do NAT binding keep-alive or not.

3.3.2 Towards eective SIP load balancing

This paper [13] by G. Kambourakis, D. Geneiatakis, T. Dagiuklas, C. Lambrinoudakis and S. Gritzalis describes another approach to a load balancing SIP architecture. They suggest that one have a single redundant load balancer front-end. This load balancer will then distribute requests to SIP proxies behind it.

To begin with, no mention at all is made of NAT. Secondly, there is a suggestion for keeping the load balancer out of the signaling path by not adding Via headers for the load balancer. This will not work if the clients implement RFC3261 correctly. If proxy a receives a request from a host where the newest Via header and the sender address don't match, then it will insert a received= tag containing the sender IP address in the Via header, and send it back using that tag.

An alternative, proposed by the authors, is to make the proxies violate RFC3261 and ignore the latest Via header if it is the load balancer. This is not a nice solution though, and it is error-prone.

A third alternative, also proposed by the authors, is to spoof the source address of packets from the load balancer to the proxies. However, this will only work for UDP. Additionally, clients behind NAT will not be able to receive the replies, because they will come from a dierent address than the one the request was sent to. But since NAT has been ignored completely, that's one of many problems.

(35)

20 Study results

3.3.3 High-availability solutions for SIP enabled voice-over-IP

net-works

This is a white paper from Cisco Systems. [5] It describes dierent ways of achieving redundancy. An important concept in this paper is the notion of placing redundancy in the previous hop or in the next hop. Examples of previous hop redundancy are backup proxies, dial peers and SRV DNS records. An example of next hop redundancy is VRRP-redundant servers.

This paper does not mention NAT at all. On the other hand, it does not seem to be aiming for solving all the issues with load balancing. Instead, it mostly details the network part, and does not involve the end-user UAs in the discussion.

3.4 Current support in Ingate platform

Through discussions with Ingate employees I learned that Ingate has an existing solution for load balancing. It does not require any additional software, but is instead makes use of SRV load balancing.

Figure 3.1. Ingate's setup

The setup features a number of proxy machines and a registrar machine. The proxy machines are regular Ingate SIParators (SIP proxy). The registrar machine handles all the registrations. When a client registers with proxy 1, the proxy forwards the request to the registrar. The registrar will then register the client's location as being at proxy 1. When another client calls the registered client, it might contact proxy 2. Proxy 2 will in turn forward the call to the registrar. The

(36)

3.5 Alternative architectures 21 registrar will forward the call to proxy 1. Proxy 1 will remember where the user registered from, and forward the call to the user.

Depending on the scenario, media can ow in dierent ways. In most cases, it will ow through the proxies. The media trac will then ow from the clients to the proxies, and be exchanged between the white and black interfaces of the proxies. (See Fig. 3.1.)

It is possible to run this setup without registrar capabilities. In that case there will be an external registrar. If the clients register through the proxies, the registrar will think that the client can be reached at the proxy. This way, NAT bindings will be kept open. This is essentially the same principle as with a registrar, except that the registrar is on the other side of the rewalls.

This design has some problems though: • Not all clients support SRV records fully.

• Call media almost always passes through the system, so two people in Malaysia, talking to each other through a setup in Sweden, will experience high latency. This will reduce the quality of the phone call.

3.5 Alternative architectures

The current Ingate solution is not bad, despite the aws listed under section 3.4. Regardless of whether they are better or not, I will here list a couple of alternative designs that I have considered.

3.5.1 Setup 1

This setup (see Fig. 3.2) features a front-end machine, along with a number of slave machines. It was the setup I initially considered. Unfortunately, the bottleneck is the amount of trac that the front-end can forward. It would naturally be possible to use a front-end machine with specialized hardware. It is however a requirement to use Ingate rewall hardware, so this is not feasible.

3.5.2 Setup 1A

An alternative could be to place load balancing in the network. A powerful router could distribute packets based on layer 3 and layer 4 information. This means that all packets from a certain IP address and TCP/UDP port could be forwarded to a certain machine. Furthermore, each machine could have the same IP address.

Unfortunately, this solution does not work very well without a very integrated set of servers. Take this scenario as an example. Client A sends an INVITE for client B who is registered with the system, and gets served by server S. S forwards the INVITE to client B, which responds with a 200 Ok. This response is sent to the system, and happens to be served by server T. T will not have any record of an INVITE being sent, and discards the reply.

Therefore, an application layer load balancer will be needed to distribute the packets, and then this setup is the same as setup 1.

(37)

22 Study results

Figure 3.2. Setup 1

Figure 3.3. Setup 1A

3.5.3 Setup 2

This approach, instead of using a front-end, uses a so-called director for distributing the calls among a number of slave machines . The director machine is connected to the same networks as the slave machines.

(38)

3.5 Alternative architectures 23

Figure 3.4. Setup 2

connections to the domain served by the system are made to the director. It then forwards the connection to an appropriate slave. Any further communications between the client and the slave, both signaling and media, are made directly, without the director in the way. SRV-capable clients can connect directly to the slaves. The director's purpose is to enable service for non-SRV-capable clients. Since the director does not handle media, it will not be the same bottleneck as the front-end machine in setup 1.

Call setup

Call setups are possible with this setup. The initial INVITE will arrive at the director and be passed along to one of the slaves. The director will not add a Record-Route header eld. The rst reply from the slave will go back through the director. Since the director did not add a Record-Route header eld, it will not be involved in any further communication with the client. The ACK from the client will be sent directly to the slave.

Transfers

As long as the director handles INVITEs as it should, then call transfers should work.

(39)

24 Study results REGISTER requests

Since the director distributes calls more or less randomly between the slaves, and there is no dedicated registrar for the system, every slave needs to know of every user's current location(s). The best alternative here is to use a registrar on the internal network, just like in the current Ingate solution. Other alternatives could be to a) fork the REGISTER request to every slave, b) make the slaves use a shared registration database or c) use a distributed hash table to store the registrations among slaves. However, they all have aws:

Option a) suers from a synchronization problem. Every slave would need to be in the same state, so the director would have to wait for each reply from the slaves. Also, inconsistency problems can arise. This option makes the system brittle. Additionally, the user database would be limited to the capacity of the weakest slave. This is a bad option.

Option b) is a better option, but still suers from extensive implementation work. Each slave would be connected to the database. Registrations could then be handled by any of the slaves. This also assumes that the database is made suciently redundant to meet the needs of the system.

Option c) is also a viable alternative, but it is not complex to implement. Distributed solutions tend to be hard to implement, and error-prone. This option probably requires a lot of work, and could possibly constitute a master's thesis project of its own.

Remote and local NAT traversal

NAT-connected clients that are SRV-capable will use the system in the same way as with the Ingate setup. The rest of this section will discuss non-SRV-capable NAT-connected clients.

When a client registers with the system, it will contact the director. Generally, the director will then be the only host that can send packets back to the client. The NAT binding made when the client registers needs to be kept open in order for the client to be able to receive calls. Now there are three options; either a) let the director keep the NAT binding open by periodically sending requests to the client, b) advertise a low register timeout to the client, so that it will re-register again shortly, or c) let a slave machine handle the keep-alive process.

Option a) is a possible solution. Every client behind NAT would need to be polled every ten or so seconds. Since the director machine does not handle any media streams, this may be feasible.

Option b) may also be a possible solution. By letting the clients handle the keep-alive polling, the director merely needs to handle the re-REGISTER requests as any other request. The problem here lies in the client. Not all clients pay attention to advertised register timeouts. A client might tell the system that it wants a register timeout of one hour, and the system would then tell the client that it should use a timeout of ten seconds instead. If the client ignores this, then the NAT binding might expire after twenty seconds, and the client will be unreachable until it registers again, which it might do every hour. Basically, this is a nice option, but it has interoperability problems.

(40)

3.5 Alternative architectures 25 Option c) would be a good solution if clients supported it. The idea is to let the slaves handle the NAT binding instead. If a slave is supposed to handle the keep-alive process, then the client needs to contact the slave rst in order to create the NAT binding with the slave. Since clients register with the director, the director somehow needs to redirect the client to an appropriate slave. This is an excerpt from RFC3261, section 10.3:

A registrar MAY redirect REGISTER requests as appropriate. One common usage would be for a registrar listening on a multicast interface to redirect multicast REGISTER requests to its own unicast interface with a 302 (Moved Temporarily) response.

A quick experiment I did showed that Snom 200 phones, rmware v3.56z, did not support this. As Snom is a brand with a reputation of supporting a lot of features and being compatible with many vendors, other vendors probably do not support this either. It could have been a nice option, but unfortunately it is not viable.

To sum up, remote NAT traversal is possible by letting the director keep the NAT binding open and advertising a low register timeout to the client.

Instant Messaging

This is possible. Depending on whether the client is behind NAT or not, the director may have to handle all IM trac (MESSAGE). Depending on the amount of trac, this may or may not be viable.

UPDATE requests

UPDATE requests may only be sent after a reply has been received. When this reply has been received, the client knows which route it should use for subsequent requests. Therefore, the UPDATE request will be sent directly to the slave, just as it should. However, in the cases that a reply has not been received yet, the UPDATE will be sent via the director.

TCP/TLS support

The director and the slaves could need to maintain an active TCP connection with each NAT-connected client of the system.

Conclusion

• Performance and scaling

This setup has a clear bottleneck in the director. It will have to handle all signaling to and from non-SRV-capable clients. Since the director is a single machine, it has a limit for when it can not handle any more connections. This limit would depend on a number of factors, for example whether the clients are behind NAT or not. If more media forwarding performance is needed, additional slaves can simply be added.

(41)

26 Study results Implementing this setup would most likely involve introducing additional client-to-system signaling overhead, for example redirect requests for handing the call over to a slave.

The director can keep track of the load of each slave, and distribute calls to the slaves with the lowest load. That way, an even distribution of work among the slaves is achieved.

• Functionality provided

Supports everything required, including regular calls, transfers, REGISTERs, remote NAT traversal, IM, TCP/TLS and UPDATEs.

• Ease of administration

Ideally, the slave machines should be congured more or less identically, except for their addresses. Most of the conguration work would be done either on the director, or in the common database.

• Security

The director is susceptible to denial of service attacks. If the director is made unavailable, then the entire system will be made unavailable to non-SRV-capable clients. This could be partly mitigated by placing an application layer rewall in front of the director, but nothing is 100% secure. Communication between the slaves and the director could be done over a physically separate private network, not shown in the gure. Without physical access, it should not be possible to gain access to this network.

(42)

Chapter 4

Design

The design that I chose basically incorporated the director concept from setup 2 into the current Ingate solution. With the addition of a director, the system would be able to cope with all kinds of clients - both the ones that cannot handle SRV records, and those that can. This split is easily done by pointing the DNS A record for the domain to the director, and the SRV records to the slaves. As it turned out, the design also solved some problems inherent in the current Ingate setup.

The idea is to use as much of the existing Ingate implementation as possible. Setup 2 would have required new code for SIP signaling, among other things. This would have been very time consuming. It would also be error prone, as the kind of functionality needed for setup 2 is non-trivial to implement, both at SIP and code levels.

This chapter contains the details of the design. Every aspect of the system will not be documented here though only the vital concepts will be explored. A number of call scenarios will be described as well. Hopefully, the examples will clarify how the system works.

4.1 Overview

From the outside, the design appears as a merging of setup 2 (director) and the current Ingate solution. The internal workings of the design are more complicated though, and not anything like setup 2.

In order to keep media away from the director, the slave machines will handle the media, just as in setup 2. The director keeps connections to NAT-connected clients open, and it is therefore the only host than can send requests to them. However, this only implies that the director handles NAT-related signaling. Media to and from NAT'ed clients can be handled by any slave. You may ask yourself how. Well, it is done by having the NAT'ed clients send media to the slave. The slave then waits until it receives a media packet from the client, before forwarding any media to the client. This way, the slave knows where to send the media packets back to, and a NAT binding in the client's NAT device will exist, because

(43)

28 Design

Figure 4.1. The chosen design

the client sent media to the slave rst. Communication between the director and the slaves would be needed though, to coordinate this eort.

One could naturally have additional dedicated machines only for director-related trac, eectively separating director-related media trac from slave-related media trac. The advantage would be that the existing Ingate solution could be kept for SRV-capable clients. Then the non-SRV-capable clients would use the director and the dedicated media forwarding machines, eectively splitting the system in two parts. There are however two good reasons why the slaves should handle all media and everything be kept as one system:

1. If the slaves share the entire media load, then the system will scale uniformly. If you run out of media capacity, you simply add another slave, and the SRV-and director-related capacity increases.

2. All the functionality needed is already implemented in the slave machines. If there was some other media proxying software that one could use, then that would also work. However, the Ingate rewall can already do this, and it does it better than most products out there. It is favorable to use the company's own products as much as possible.

(44)

4.2 Core design concept 29

4.2 Core design concept

Essentially, the tricky part here is how to move director-related trac NAT-connected clients away from the director, while still keeping it the entry point for them.

In this design, there is a concept of virtual machines. Not in the sense of virtualized computing environments, but machines that function as a single machine, but actually consist of dierent parts, not necessarily running on the same physical machine.

Each Ingate rewall can be split into two basic building blocks - a SIP layer and a rewall layer. Basically, The SIP layer handles signaling, and the rewall layer handles media.

A virtual machine consists of both a SIP layer and a rewall layer, but these layers do not have to reside on the same physical machine. Every physical machine provides one SIP layer and one rewall layer. Both are not necessarily in use though. For example, a director machine would not use its rewall layer as it does not handle any media.

In this setup, the slave and director machines are composed as shown in Fig. 4.2. When the director needs to process a call with media, it will then form the virtual machine consisting of its own SIP layer and a slave's rewall layer. This is illustrated in Fig. 4.3. As far as the director is concerned, it handles media. In reality, the media is handled by a slave, but the director is unaware of that.

Figure 4.2. Slave and director machines. Dashed part does not forward media.

Figure 4.3. Director machine using slave's rewall layer (both shaded) to form a virtual machine capable of forwarding media.

(45)

30 Design parts be fully separated. The two layers must then be joined using some form of middleware. I have chosen to use CORBA[6] (omniORB v4.1.2[16]) as the middleware component, since it has decent performance[15] and is an established technology. Another serious alternative I considered was XML-RPC[25], but I just didn't feel that it provided the performance and capabilities that CORBA brings with it. Other options that were considered very briey were Thrift[24] and D-Bus[7]. Fig. 4.4 shows how the separation of the code base is made.

Figure 4.4. Virtual machine with SIP and rewall layers.

Since CORBA can only send call parameters that have been specied in IDL[10], an adapter layer is needed to adapt the data structures of the current call parameters to something that CORBA can handle. The alternative would be to rewrite everything to use CORBA's data structures, and that would certainly not be feasible. See section 5.2 for more information on this.

4.3 Registration and NAT

Here I explain why registrations and NAT do not pose a problem to this design.

4.3.1 Registrations

Registrations are handled in the same way as in the Ingate setup, with a registrar on the internal network. Connections to NAT-connected clients will be kept alive by the director's SIP layer.

(46)

4.4 Call scenarios 31

4.3.2 Remote and local NAT traversal

Remote NAT connections are kept alive by the director. As described in chapter 4.1 (Overview), media is handled by a slave's rewall layer.

4.4 Call scenarios

There are four primary call scenarios, and they will be presented here. In the scenarios there are two phones involved. One is named A and the other is named B. A is registered through the slave S, and also uses that slave for outbound calls. B registered through D, and uses it for outbound calls. M is an arbitrary slave. It could be the same as S, or it could be a dierent one. C is a phone on the internal network. It is registered directly with the registrar.

Six alternative scenarios will also be presented. They are related to the primary scenarios, and also involve two phones. An example of the notation used is given below. The scenarios then follow.

• INVITE SDP(A)

This identies an INVITE request containing an SDP payload pointing to A.

• 200 Ok SDP(int(S))

This identies a 200 Ok response containing an SDP payload pointing to the internal address of S.

4.4.1 Scenario 1: A calls B using SRV, SDP in INVITE/200

In this scenario, A will call B using SRV. The rewall layer on S will handle all media for the call. The SIP layers on S and D will both use that rewall layer. INVITE

A will rst send INVITE;SDP(A) to S. S will choose its own rewall layer as the rewall layer to use for this call. It could just as well choose M's rewall layer, but that would mean more RPC communication over the network, which would degrade performance. The choice of rewall layer is saved in the SipTranslator object that is created. The SIP layer then interacts with the rewall layer, opening an internal port, and rewriting the SDP to point to the internal port. It now sends INVITE;SDP(int(S)) to the registrar, which sends it on to the director.

The director extracts the information about which rewall layer is being used from the SDP. It saves this information, S, in a SipTranslator object for later retrieval. It then acts as a virtual machine consisting of the SIP layer of the director, and the rewall layer of S. The rewall layer will notice that the internal address in the SDP is its own address, and rewrite the address to what it used to be. The director then forwards INVITE;SDP(A) to B.

(47)

32 Design

Figure 4.5. Scenario 1: A calls B using SRV, SDP in INVITE/200

200 Ok

B replies with a 200 Ok;SDP(B) to D. D fetches the rewall layer designated for this call from the SipTranslator object, and passes it SDP(B). It will rewrite it to point to an internal port on S. The director then moves the rewall layer identier, S, to the dialog object, which is persistent for the entire session. The director nally sends 200 Ok;SDP(int(S)) to the registrar, which forwards it back to the slave.

The slave moves the rewall layer identier over to its dialog object as well. It then passes it SDP(int(S)), which the rewall layer recognizes as its own address. It then rewrites the SDP to what it was originally, namely SDP(B). The reply, 200 Ok;SDP(B), is then sent to A.

ACK

In this scenario, the ACK does not involve any SDP. Therefore, the ACK is simply relayed to B the same way as the initial INVITE.

Media will now travel directly betweeen A and B.

4.4.2 Scenario 2: A calls B using SRV, SDP in 200/ACK

This scenario is the same as scenario 1, except that the SDP negotiation is done in the 200/ACK instead of the INVITE/200.

(48)

4.4 Call scenarios 33

Figure 4.6. Scenario 2: A calls B using SRV, SDP in 200/ACK

INVITE

A will rst send INVITE to S. Since the INVITE does not contain an SDP payload, it is simply sent on to the registrar, which forwards it to the director.

Since there is no SDP in the INVITE, the director forwards the request to B. 200 Ok

B replies with a 200 Ok;SDP(B) to D. Since the SipTranslator object does not contain a rewall layer identier, D chooses a rewall layer M to use. It then passes SDP(B) to the rewall layer. It will rewrite it to point to an internal port on M. The director then moves the rewall layer identier, M, to the dialog object, which is persistent for the entire session. The director nally sends 200 Ok;SDP(int(M)) to the registrar, which forwards it to the slave.

Since the SipTranslator object on S does not contain a rewall layer identier, it extracts this information, M, from the SDP payload and stores it in the dialog object. It then passes SDP(int(M)) to the rewall layer, which recognizes the address as its own. It then rewrites the SDP to what it was originally, namely SDP(B). The reply, 200 Ok;SDP(B), is then sent to A.

ACK

A then replies with ACK;SDP(A). S fetches the rewall layer designated for this call from the dialog object, and passes it SDP(A). It will rewrite it to point to

(49)

34 Design an internal port on M. The slave then sends ACK;SDP(int(M)) to the registrar, which forwards it to the director.

D passes the rewall layer, which is saved in the dialog object, SDP(int(M)). The rewall layer recognizes this as its own address, and rewrites the SDP to what it was originally, namely SDP(A). The request, ACK;SDP(A), is then sent to B.

4.4.3 Scenario 3: B calls A without SRV, SDP in INVITE/200

In this scenario, B will call A without using SRV. The rewall layer on M will handle all media for the call. The SIP layers on S and D will both use that rewall layer.

Figure 4.7. Scenario 3: B calls A without SRV, SDP in INVITE/200

INVITE

B sends INVITE;SDP(B) to D. Since the request has an SDP payload, D chooses a rewall layer M to use for the call. It then passes SDP(B) to the rewall layer, which rewrites it to point to an internal port on M. The director then saves the rewall layer identier, M, in the SipTranslator object. The director nally sends INVITE;SDP(int(M)) to the registrar, which forwards it to S.

S will, upon receiving INVITE;SDP(int(M)) assume that M is the rewall layer associated with this call. It will save M in the SipTranslator object, and then pass SDP(int(M)) to the rewall layer, which recognizes the address as its own. It then rewrites the SDP to what it was originally, namely SDP(B). The request, INVITE;SDP(B), is then sent to A.

(50)

4.4 Call scenarios 35 200 Ok

A replies with a 200 Ok;SDP(A) to S. S fetches the rewall layer designated for this call from the SipTranslator object, and passes it the SDP payload. It will rewrite it to point to an internal port on M. The slave then moves the rewall layer identier, M, to the dialog object, which is persistent for the entire session. The director nally sends 200 Ok;SDP(int(M)) to the registrar, which forwards it to the director.

The director moves the rewall layer identier over to its dialog object as well. It then passes SDP(int(M)) to the rewall layer, which it recognizes as its own address. It then rewrites the SDP to what it was originally, namely SDP(A). The reply, 200 Ok;SDP(A), is then sent to B.

ACK

In this scenario, the ACK does not involve any SDP. Therefore, the ACK is simply relayed to A the same way as the initial INVITE.

Media will now travel directly betweeen B and A.

4.4.4 Scenario 4: B calls A without SRV, SDP in 200/ACK

This scenario is the same as scenario 3, except that the SDP negotiation is done in the 200/ACK instead of the INVITE/200.

Load balancing of IP telephony

Institutionen för datavetenskap

Department of Computer and Information Science

Final thesis

Load balancing of IP telephony

David Montag

LIU-IDA/LITH-EX-A--08/051--SE

2008-11-24

Institutionen för datavetenskap

Department of Computer and Information Science

Master's Thesis

Load balancing of IP telephony

David Montag

Institutionen för datavetenskap

Department of Computer and Information Science

Master's Thesis

Load balancing of IP telephony

David Montag

Abstract

Acknowledgements

Contents

List of Tables

List of Figures

Chapter 1

Introduction

1.1 Problem formulation

1.2 More about SIP

1.3 Comments on NAT

1.4 Load balancing done with DNS

1.5 Requirements

1.5.1 Research

1.5.2 Implementation

1.6 Method

1.7 Thesis Outline

Chapter 2

Background

2.1 Example scenario

2.2 SIP

2.3 NAT

2.4 NAT problems with SIP

2.5 DNS

2.6 Load balancing using DNS

2.6.1 SRV records

2.6.2 NAPTR records

Chapter 3

Study results

3.1 What technology do other vendors have?

3.1.1 Radware

3.1.2 F5

3.1.3 OpenSER

3.1.4 Asterisk

3.1.5 Foundry

3.1.6 PLANET Technology Corporation

3.1.7 AG Projects

3.1.8 A10 Networks

3.2 What support for load balancing exists in SIP today?

3.3 Other studies about SIP and load balancing

3.3.1 Failover, load sharing and server architecture in SIP telephony

3.3.2 Towards eective SIP load balancing

3.3.3 High-availability solutions for SIP enabled voice-over-IP

net-works

3.4 Current support in Ingate platform

3.5 Alternative architectures

3.5.1 Setup 1

3.5.2 Setup 1A

3.5.3 Setup 2

Chapter 4

Design

4.1 Overview

4.2 Core design concept

4.3 Registration and NAT

4.3.1 Registrations

4.3.2 Remote and local NAT traversal

4.4 Call scenarios

4.4.1 Scenario 1: A calls B using SRV, SDP in INVITE/200

4.4.2 Scenario 2: A calls B using SRV, SDP in 200/ACK

4.4.3 Scenario 3: B calls A without SRV, SDP in INVITE/200

4.4.4 Scenario 4: B calls A without SRV, SDP in 200/ACK

3.3.2 Towards eective SIP load balancing