A reverse proxy for VoIP: Or how to improve security in a ToIP network

(1)

IN

DEGREE PROJECT ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2016 ,

A reverse proxy for VoIP

Or how to improve security in a ToIP network GUILLAUME DHAINAUT

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING

(2)

KTH Royal Institute of Technology

Master’s Programme in Network Services and Systems - ANSSI

Guillaume Dhainaut 921006-5950

dhainaut@kth.se

A reverse proxy for VoIP

Or how to improve security in a ToIP network

Master’s Thesis

Stockholm, February 24, 2016

Supervisors Pierre Lorinquer ANSSI

Fabien Allard ANSSI

Examiner Panagiotis Papadimitratos (papadim@kth.se) KTH

(3)

Abstract

The need for security is crucial in Telephony over IP (ToIP). Secure protocols have been designed as well as specific devices to fulfill that need. This master thesis examines one of such devices called Session Border Controller (SBC), which can be compared to reverse proxies for ToIP. The idea is to apply message filters to increase security.

This thesis presents the reasons of SBC existence, based on the security weaknesses a ToIP network can show. These reasons are then used to establish a list of features which can be expected from a SBC and discuss its ideal placement in a ToIP network architecture. A test methodology for SBCs is established and used on the free software Kamailio as an illustration. Following this test, improvements of this software, regarding threats prevention and attacks detection, are presented and implemented.

Sammanfattning

Behovet av säkerhet är av avgörande betydelse i telefoni över IP (ToIP). Säkerhet- sprotokoll har utformats samt särskilda enheter för att uppfylla detta behov. Detta examensarbete undersöker en av sådana enheter som kallas Session Border Controller (SBC), vilket kan jämföras med omvända proxyservrar för ToIP. Tanken är att tillämpa meddelandefilter för att öka säkerheten.

Denna avhandling presenterar orsakerna till SBC existens, baserat på de säkerhets

svagheter en ToIP nätverk kan visa. Dessa skäl används sedan för att upprätta en förteck-

ning över egenskaper som kan förväntas av en SBC och diskutera dess ideal placering i

en ToIP nätverksarkitektur . En testmetodik för SBC är etablerad och används på fri

programvara Kamailio som en illustration. Efter detta test, förbättringar av denna pro-

gramvara, om hot förebyggande och attacker upptäcka, presenteras och genomförs.

(4)

Acknowledgements

This research was supported by ANSSI (French Network and Information Security Agency) in Paris. They offered me helps and materials. I owe my gratitude to Pierre Loriquer and Fabien Allard which were my ANSSI supervisors and helped me throughout this thesis by giving me advice and idea.

I would also like to thank Colin Chaigneau and Valentin Houchouas, my colleagues at ANSSI, for their helps in areas of which they were experts.

I am thankful to Panagiotis Papadimitratos, my supervisor at KTH, for his helpful advice about the report.

Finally I thank Alice Tourtier for her help and support.

(5)

1 Introduction 1

1.1 Goal of the thesis . . . . 1

1.2 Contribution . . . . 2

1.3 Outline . . . . 2

2 VoIP Technologies 3 2.1 SIP . . . . 4

2.1.1 Messages . . . . 4

2.1.2 Elements of a SIP call establishment . . . . 6

2.1.3 Exchange examples . . . . 8

2.1.4 Security Mechanisms . . . . 9

2.2 Media session protocols . . . . 11

2.2.1 Codecs . . . . 11

2.2.2 RTP . . . . 12

2.2.3 SRTP . . . . 13

2.2.4 RTCP . . . . 14

2.2.5 SDP . . . . 15

2.3 Other protocols . . . . 16

2.3.1 VoIP protocols . . . . 16

2.3.2 Service protocols . . . . 17

2.4 Unified Communications . . . . 17

3 Security on a ToIP infrastructure 19 3.1 Security issues in ToIP . . . . 19

3.1.1 Risks . . . . 19

3.1.2 Common ToIP attacks . . . . 21

3.1.3 How to make secure ToIP . . . . 23

3.1.4 Limits . . . . 25

3.2 Session Border Controller . . . . 27

3.2.1 Principle . . . . 27

3.2.2 Differences with existing devices . . . . 29

4 SBC features 30 4.1 Internetworking . . . . 30

4.2 Media . . . . 30

4.3 QoS . . . . 31

4.4 Security . . . . 32

4.4.1 Upstream protection . . . . 32

4.4.2 Common attack protection . . . . 34

4.4.3 Downstream protection . . . . 36

5 SBC Architecture integration 39 5.1 At the border of ToIP architecture . . . . 39

5.1.1 Presentation . . . . 39

5.1.2 Limits . . . . 40

5.2 At the Center of ToIP architecture . . . . 40

(6)

5.2.1 Presentation . . . . 40

5.2.2 Limits . . . . 40

5.3 Combination . . . . 41

6 SBC Security Test 42 6.1 Methodology . . . . 42

6.1.1 Test environment . . . . 42

6.1.2 Features announced . . . . 42

6.1.3 Fuzzing . . . . 43

6.1.4 Dos/Flood . . . . 43

6.1.5 TLS quality . . . . 44

6.1.6 Common SIP attacks . . . . 44

6.2 Test of Kamailio . . . . 45

6.2.1 Test environment . . . . 45

6.2.2 Results . . . . 47

6.2.3 Conclusion . . . . 51

7 Improvements 52 7.1 Permissions module . . . . 52

7.1.1 Logic . . . . 52

7.1.2 Implementation . . . . 53

7.2 CDR analysis . . . . 53

7.2.1 Logic . . . . 53

7.2.2 Implementation . . . . 56

7.2.3 Tests . . . . 56

8 Conclusion and future work 60 8.1 Accomplished work . . . . 60

8.2 SBC conception . . . . 60

8.3 Future work . . . . 61

A Loop amplification Attack 66

B Kamailio configuration files 67

C Asterisk configuration files 70

(7)

List of Figures

1 ITU-T protocol set . . . . 3

2 VoIP IETF protocol set . . . . 4

3 Example of a SIP INVITE message . . . . 5

4 SIP Registration process . . . . 8

5 SIP Call process . . . . 9

6 SIP Digest authentication example . . . . 10

7 RTP header as defined in RFC 3550 [1] . . . . 13

8 Example of a SDP description . . . . 15

9 Example of DoS attack with INVITE messages . . . . 22

10 Security measures at the network level . . . . 25

11 Example of SBC placement . . . . 28

12 Use of a B2BUA to transcode audio flow . . . . 28

13 SBC action to allow high frequency registration . . . . 33

14 Internal architecture with a SBC at the border of the ToIP network . . . . 39

15 VoIP flows with a SBC at the center . . . . 40

16 Internal architecture with two SBCs at the border and at the center of the ToIP network . . . . 41

17 Test environment of Kamailio . . . . 47

18 CPU and RAM usages versus message intensity during DoS attacks . . . . 49

19 Learning period in time . . . . 55

20 Example of a user profile . . . . 57

21 Call hour distribution for a user . . . . 58

22 Receiver Operating Characteristic for the call classifier . . . . 59

23 REGISTER messages of a loop amplification attack . . . . 66

24 Example of the flow of a SIP amplification attack from RFC 5393 [2] . . . 66

(8)

List of Tables

1 Main SIP request methods . . . . 5

2 Main SIP response codes . . . . 5

3 Main audio codecs . . . . 12

4 Codecs understood by some SIP systems . . . . 31

5 Overview of the security features . . . . 43

6 Security features implemented in Kamailio . . . . 48

7 Kamailio DoS results . . . . 49

8 FPR and TPR results . . . . 58

(9)

Abbreviations and Acronyms

ANSSI French Network and Information Security Agency B2BUA Back-to-Back User Agent

CA Certificate Authority CAC Call Admission Control CDR Call Detail Record CoS Class of Service

CPE Customer Premises Equipment CPU Central Processing Unit

CVE Common Vulnerabilities and Exposures DHCP Dynamic Host Configuration Protocol

DNS Domain Name System

DoS Denial of Service DDoS Distributed DoS

DTLS Datagram Transport Layer Security DTMF Dual-Tone Multi-Frequency

ERP Enterprise Resource Planning FPR False Positive Ratio

FTP File Transfert Protocol HTTP Hypertext Transfer Protocol

HTTPS HTTP Secure

IAX Inter-Asterisk eXchange

ICE Interactive Connectivity Establishment IETF Internet Engineering Task Force

IP Internet Protocol

IPBX Internet PBX

IPsec Internet Protocol Security

ISDN Integrated Services Digital Network

ISUP Integrated Services Digital Network User Part ITU International Telecommunication Union

ITU-T ITU Telecommunication Standardization Sector LACK Lost Audio Packets Steganography

LAN Local Area Network

LSB Least Significant Bit MAC Media Access Control MITM Man in the middle

MIKEY Multimedia Internet KEYing MTU Maximum Transmission Unit NAT Network Address Translator NTP Network Time Protocol

OS Operating System

PBX Private Branch Exchange

PC Personal Computer

PKI Public Key Infrastructure

PSTN Public Switched Telephone Network

QoS Quality of Service

(10)

RAM Random Access Memory RFC Request for Comments

ROC Receiver Operating Characteristic RTCP RTP Control Protocol

RTD Round-Trip Delay

RTP Real-time Transport Protocol

S/MIME Secure/Multipurpose Internet Mail Extensions SBC Session Border Controller

SDES SDP Security Description for Media Streams SDP Session Description Protocol

SIP Session Initiation Protocol SIPREC SIP Recording

SIPS SIP Secure

SPIT Spam over Internet Telephony SRC Session Recording Client SRS Session Recording Server

SRTP Secure Real-time Transport Protocol SS7 Signalling System No. 7

SSL Secure Sockets Layer

STUN Simple Traversal of UDP through NAT TCP Transmission Control Protocol

TFTP Trivial FTP

TLS Transport Layer Security ToIP Telephony over IP

TPR True Positive Ratio

TURN Traversal Using Relay NAT

UA User Agent

UAC User Agent Client UAS User Agent Server UDP User Datagram Protocol URI Uniform Resource Identifier

VLAN Virtual LAN

VM Virtual Machine

VoIP Voice over IP

(11)

1 Introduction

The concept of Voice over IP (VoIP) appeared in the early 90s. At that time, the Internet was emerging and people proposed the idea to make the phone calls pass through the Internet network. To do so, they could not reuse the old telephone systems but they had to develop new protocols to make the calls on top of the TCP/IP networks. We had to wait until the end of the 90s to see instantiations of such protocols produced by IETF and ITU-T, respectively SIP and H.323. Nowadays, the general public is used to proprietary software such as Skype which is the incontestable leader with its 300 million users [3].

Telephony over IP (ToIP) represents the whole system which use VoIP, including IPBX and phones.

Passing from the old Public Switched Telephone Network (PSTN) to ToIP brings several benefits. First, it helps reducing the cost of each call, both for international calls and local calls. Then, the convergence of the telephony network and computer network brings other features such as video, data transfer or messaging, that can be used from any connected device like computers and smartphones. The adoption of this technology by the market has been quite fast. For example, the penetration among US businesses has passed from 12% in 2004 to 79% in 2013 [4]. Some experts predict that the VoIP market will continue to grow to reach $82.5 billion services revenue in 2018, against $69.6 billion in 2014 [5].

This evolution of telephony has also increased the crucial need for security. ToIP inherits the security issues of IP based systems in addition to its own vulnerabilities.

Contrary to the old PSTN system, physical interaction is not needed anymore to attack a system and anyone connected to the Internet can exploit a vulnerability of the network, from spying, to spoofing and toll frauds. At the early days of ToIP, most companies neglected security in their infrastructure because they were not aware of the risks involved.

A report produced by the consultancy company Nettitude [6] shows that attacking a ToIP system is very attractive because of the money attackers can make (with toll fraud for example). Nevertheless, with the increasing number of attacks, companies started to secure their infrastructure, for example using secure versions of the protocols.

1.1 Goal of the thesis

This thesis will look at the security aspects of a ToIP network architecture inside a company and specifically examine the security brought by the Session Border Controllers (SBCs). The main goals of this thesis are:

• To provide a state of the art of the security features offered by the SBC imple- mentations and to compare them to threats internal or external to an IP telephony infrastructure,

• To develop a method for SBC testing and experiment it on a free solution,

• To propose changes in existing solutions and implement some, as time permits.

Even if a focus will be given on security in this thesis, the other functions a SBC can

provide will also be briefly presented to cover all the aspects of a SBC. The objective is

to give the readers an overview of the features that may be offered by a SBC. Indeed,

(12)

knowing what a SBC can actually do on a ToIP network will help to understand the critical aspect of this new element.

1.2 Contribution

This thesis work explains why a SBC is needed inside a ToIP network. The security provided by the SBC is analyzed in regards of the threats and new SBC features are presented. This thesis also gives a security test methodology for SBC and applies it to Kamailio [7] which is one of the most used SIP servers. This thesis also presents improvements on Kamailio to prevent toll fraud.

1.3 Outline

The thesis is organized and structured linearly. First, in Section 2, it will be described how ToIP actually works and give the necessary knowledge to understand the rest of the thesis. Then, in Section 3, the security issues a ToIP architecture might encounter will be introduced by describing the risk associated along with the common attacks. This presentation will bring us to the introduction of the SBC inside our network. After, Section 4 will present the features we can expect from it and how it will improve our whole system, based on the problems shown before. The question of its integration in our infrastructure remains and we will reflect on it in the following Section 5. In Section 6 a SBC test methodology will be created and used on the open-source SBC Kamailio.

Finally in Section 7, some implementations of new features for SBCs will be presented

with a permissions module and a program to detect suspicious calls.

(13)

2 VoIP Technologies

Some background knowledge about VoIP and security is necessary. In this section, the protocols that are commonly used in VoIP will be presented. A VoIP ecosystem needs two parts: the signaling part to establish the call between two users, and the media transport part to carry the data from one user to the other once the session has been established.

There are two sets of protocols mainly used today: the ITU-T set shown in Figure 1 and the IETF set shown in Figure 2. Both use the same protocol for the media transport but they differ on the signaling protocol used.

The ITU-T set is called H.323 and is composed of several different protocols for the signaling part (H.225, H.245 and T.120). A large part of it comes from the protocol H.320, used in Integrated Services Digital Network (ISDN) which comes from the traditional telephony world. It made the transition easier and it explains why H.323 was the main protocol used in the early 2000’s. H.323 is a binary protocol and is complex by nature.

IP

TCP UDP

H.225 (Q.931) H.245 T.120

Control Data

RTP/

SRTP

RTCP Audio

Video

Media

H.225 (RAS)

Control

Figure 1: ITU-T protocol set

The IETF is inspired by the Internet protocols and is text-based. It is composed of the

Session Initiation Protocol as a signaling protocol and the Real-Time Transport Protocol

(RTP) as a media protocol. SIP is media agnostic, it can be used to establish any type of

session whereas H.323 is restricted to voice and video and cannot be used to support the

new requirement of companies. SIP has become the main protocol used and almost all

the new VoIP architectures are made using the IETF set, whereas the number of H.323

deployment is decreasing [8]. Consequently, this thesis will focus on the IETF set.

(14)

IP

UDP UDP/TCP

TLS SIP SDP

Signaling

RTP/

SRTP Audio Video

RTCP

Media

Figure 2: VoIP IETF protocol set

SIP operates at the application layer and is agnostic of the transport layer used to carry it. It can be used with UDP, TCP or TLS with no change in the messages. VoIP communications need very low delays to be close to real time and, at the beginning, most implementations used UDP to carry SIP. However, the improvement of the Internet connection quality and the use of VoIP in mobile environments changed this habit and now a lot of implementations use TCP to transport SIP messages whereas media messages are still carried over UDP. During the first messages, the Session Description Protocol (SDP) is used inside SIP for exchanging the media parameters that will be used for the RTP session. When the session is established between the users with SIP, another set of protocols is used for the media part. The media flow is carried over RTP. RTP is a protocol on top of UDP for real time transfer that provides helps for jitter compensation, packet loss and detection of out of sequence arrival. RTP is used in combination with RTCP which provides statistics and control information for the associated RTP session.

2.1 SIP

SIP is a signaling, presence and instant messaging protocol developed to set up, modify, and tear down multimedia sessions between two or more users. It is an open standard that was defined in the RFC 3261 [9] and updated since by several other RFCs. It is a text-based protocol (in contrast to binary protocols), meaning that all the messages exchanged are in text and that makes it really easier to read and understand. It reuses many elements of the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP). Indeed, like for HTTP, the messages can be classified into request and response messages. It also uses Uniform Resource Identifiers (URIs) to identify resources as SMTP. sip:guillaume@dhainaut.fr is an example of such URI.

2.1.1 Messages

The SIP messages include request messages and response messages. The first line of

a SIP request includes the name of the method that identifies the request and the first

line of the response message includes a number called response code. The main request

methods are shown in Table 1 and the classes of response code are shown in Table 2.

(15)

SIP message Description

INVITE Used to establish media sessions between user agents

REGISTER Used to notify a SIP network of its current contact URI (IP address) BYE Used to terminate an established media session

ACK Used to acknowledge final responses to INVITE requests CANCEL Used to terminate pending INVITE or call attempt OPTIONS Used to query a system about its capabilities

Table 1: Main SIP request methods

Code Description Action

1xx Informational Indicates the status of the call prior to completion

2xx Success The request has succeeded. The original sender must send an ACK if it was for an INVITE

3xx Redirection The client should retry the request to another server given in the response

4xx Client error Request has failed due to the client. It must retry with a correct message following indications in the response

5xx Server failure Request has failed due to the server. Client may try another server

6xx Global failure Request has failed. Client should not try again with this server or any other

Table 2: Main SIP response codes

A full SIP message is shown below in Figure 3. It is first composed of a line with the method or the response code, the user’s URI and the version of SIP. This first line is followed by several header fields and a body.

INVITE sip:usera@voip.com SIP/2.0

Via: SIP/2.0/UDP 67.152.23.12:5060;branch=z9hG4bK8c4b Max-Forwards: 70

From: "Bob" <sip:userb@test.com>;tag=7f795d7fe1 To: <sip:usera@voip.com;user=phone>

Call-ID: 50e333b9f136bf53 CSeq: 25349 INVITE

Contact: "Bob" <sip:userb@67.152.23.12:5060;transport=udp>

User-Agent: Ekiga/4.0.1

Content-Type: application/sdp Content-Length: 618

(SDP not shown)

Figure 3: Example of a SIP INVITE message

Some header fields are mandatory (To, From, CSeq, Call-ID, Max-Forwards and Via)

while the others are optional, even though they can be very useful. The main ones are:

(16)

• Via: Records the SIP route taken by a request and is used to route a response back to the originator

• To: Indicates the recipient of the request

• From: Indicates the originator of the request

• Max-Forwards: Indicates the maximum number of hops a SIP request may go through

• Call-ID: Must be unique and identifies a call between two users

• CSeq: Contains a decimal number that increases for each request

• Contact: Conveys a URI to identify the resource requested or the request originator

• User-Agent: Conveys information about the system originating the request

• Content-Type: Indicates the Internet media type in the message body

• Content-Length: Indicates the number of octets in the message body

There are more than 110 different SIP header fields defined in RFCs [10]. Anyone can define a new SIP header if it is useful for his system and some headers have been created by the main vendors to provide new features. However, the main problem with custom headers is the lack of interoperability between systems from different vendors.

2.1.2 Elements of a SIP call establishment

In this section the behavior of standard SIP elements and their role in a ToIP network is examined.

User Agent

SIP-enabled end devices are called user agents (UAs). They can be separated into two categories:

• A User Agent Client (UAC) is a logical entity that sends SIP requests

• A User Agent Server (UAS) is a logical entity that receives the requests and returns SIP responses

Most SIP implementations behave as both client and server. They can send messages to establish a call and at the same time they listen for any incoming call.

SIP Phones

SIP Phones are the very end points of calls. Their main tasks are to make calls and

to receive calls. They can run on computer like Linphone, Ekiga or Skype (such software

are called softphones), or separate IP phones with dedicated hardware like Cisco phones,

Avaya phones or Mitel phones.

(17)

SIP Gateways

A SIP gateway is an application that interfaces a SIP network to a network using another signaling protocol such as the public switched telephone network (PSTN) or a network running with H.323. A gateway terminates the signaling path and can also terminate the media path. It can sometimes be decomposed into a media gateway (MG) for the media part and a media gateway controller (MGG) for the signaling part.

Proxy Servers

A SIP Proxy Server receives SIP requests from a user agent or another proxy and transmits them. Just as a router forwards IP packets at the IP layer, a SIP proxy for- wards SIP messages at the application layer. A proxy is only allowed to modify messages following the limitations of the RFC 3261. A proxy server can also extend the security by doing authentication (details in Section 2.1.4). Kamailio [7] is one of the most known free SIP proxy.

Registrar Servers

A Registrar Server accepts REGISTER requests from clients. When it receives a message, it updates the location database to make the contact information from the request available to the other SIP servers of the domains. A registration server usually requires the authentication of the client sending the REGISTER.

Redirect Server

A Redirect Server responds but does not forward the request. It uses a database to lookup users and sends a 3xx response directing the client to contact another URI to reach the user.

Back-to-Back User Agent (B2BUA)

A Back-to-Back User Agent receives SIP requests, rewrites them and sends them out

as new requests. Some B2BUAs behave like proxies but do not follow the same rules

as they can rewrite the From, Via, Contact and Call-ID headers. They divide the call

into two call legs so they can manage each side of the call. Since all flows pass through

a B2BUA, it is aware of the state of every call and can perform call control to add

new features like voice messaging for example. The most famous free implementation of

B2BUA server is Asterisk [11] and it is used by more than one million communication

systems [11]. FreeSWITCH [12] is another famous free implementation. In addition, they

can both act as an IPBX and a media gateway, and thus manage the entire ToIP of a

company.

(18)

2.1.3 Exchange examples

Now that the foundations of SIP have been described, as well as the elements of the network and the messages exchanged, two important examples of SIP exchanges can be described: the registration and the establishment of a call.

Registration

Figure 4 shows the registration process. The UAC sends a REGISTER message to the registrar which responds with a 200 OK message. Each UAC has to register itself to a registrar if it wants to receive calls. With this information, the SIP proxy can redirect the call to the right IP address. If the UAC does not do it, when the SIP proxy will receive a call for it, the proxy will not be able to transmit the call. The registrar might ask the UAC to authenticate itself.

Alice Registrar

REGISTER

200 OK

Figure 4: SIP Registration process

Call Establishment

Figure 5 shows a call between Bob and Alice. She wants to call Bob who is registered to another domain. First, she sends an INVITE message to her SIP proxy which will transmit it to the proxy of Bob’s domain. After transmitting it, the SIP Proxy sends a 100 Trying message to Alice to make her know that her message has been processed.

When the second SIP Proxy receives the INVITE, it knows where to find Bob (because Bob registered himself before) and transmits the INVITE to Bob. As for the first proxy, the second proxy responds with a 100 Trying message. Bob’s phone receives the INVITE and starts to ring. It sends a 180 Ringing message that will be transmitted back to Alice.

When Bob picks up his phone, it sends a 200 OK response that will be again transmitted to Alice. She receives the 200 OK and sends in response an ACK to Bob through the proxies.

At this time the session is established, each machine of the user knows the IP address of the other. Thus, they can start to send audio or video between them with RTP protocol.

When one wants to end the call by hanging up the phone, it sends a BYE message to the

other that will respond with a 200 OK message to make the BYE originator knows that

the message has been well received (with UDP the message might be lost). The process

can be much more complex in a production environment because of all the other elements

on the way.

(19)

Alice calls

Alice SIP Proxy SIP Proxy Bob

INVITE

INVITE 100 Trying

100 Trying 180 Ringing

180 Ringing

Bob answers 200 OK

200 OK

ACK

Media Session (RTP)

BYE 200 OK

Figure 5: SIP Call process

2.1.4 Security Mechanisms

The telephony architecture in a company is a critical element that has to satisfy the fundamental concepts of security. SIP comes with some mechanisms to provide authenti- cation, confidentiality and integrity that have been defined since the RFC 3261 [9].

Authentication

Three methods have been described to provide authentication for SIP:

• Digest authentication: It is based on HTTP digest mode [9]. A SIP server or UA can challenge another UA to resend a request proving knowledge of a shared secret, which is used to compute a MD5 hash of several parameters (from, to, method, nonce, realm) that will be send in a new message. An example of transaction is shown Figure 6 with a registration. The UAC sends a message, the UAS responds with a 401 Unauthorized message or with a 407 Authentication Required message which contains a nonce (to prevent replay attacks) and a realm (to identify the system being accessed) that will be used to compute the MD5 hash. The new message will be the same with a response header field containing the parameters and the hash. If the hash is correct the request will be processed by the UAS.

This method only provides a way for the server to authenticate the client, not the

opposite. The HTTP digest authentication is considered as weak and vulnerable

[13] and some propositions of improvement have been made [14]. One should avoid

to base his entire authentication system on it and use one of the other mechanisms

below in addition.

(20)

• TLS client authentication: SIP can be used on top of TLS to employ the au- thentication process of TLS. If the X.509 server certificate is signed by a certificate authority recognized by the client, the certificate can be used to authenticate the public key of the server and later to authenticate the server within the TLS hand- shake. The server can also request a client certificate for mutual authentication.

However it is rare that the clients also have a certificate so the server can use the digest authentication presented above once the TLS session has been established.

• Identity: Proposed in the RFC 4474 [15]. A new SIP header field, Identity, is inserted by a proxy server when forwarding a request. The proxy first authenticates the request to make sure it is being sent by the identity in the from header field. If so, some parts of the request are signed and the signature is included in the Identity header field. An Identity-Info header field indicates the link to the certificate used to sign the message. When a UA or a proxy receives a request it can get and check the certificate following the link in the Identity-Info and check the signature in the Identity. If the signature is correct, the From URI is authenticated. As for TLS, Identity needs a PKI for the certificate management.

Alice Registrar

REGISTER 401 Unauthorized REGISTER (with MD5 hash)

200 OK

Figure 6: SIP Digest authentication example

Confidentiality and integrity

The SIP messages are rarely sent directly from user to user and pass through interme- diates. These intermediates may want to change the SIP messages to sanitize them for example. With the use of end to end encryption or entire message signature, messages will not be alterable anymore by intermediates. As a user is not aware of all the ToIP elements on the path, it is strongly not recommend to use them. SIP provides three common ways to provide confidentiality and integrity:

• S/MIME: It permits to encrypt or sign the message body (usually SDP) in a user- to-user way. It was first designed for email sending but it has evolved, and now it has been adapted for HTTP and SIP. It is based on certificates and thus requires a Public Key Infrastructure (PKI). The messages are sent with the body encrypted with the public key of the recipient and signed by the private key of the sender.

The key exchange mechanism is described in the section 23.2 of the RFC 3261 [9].

Even if S/MIME has been defined in a RFC, no implementation of it is widely

(21)

used today, probably because it requires the same system as TLS while offering less advantages. In addition, it does not protect the SIP headers of the messages and does not provide protections against replay attack or messages order modification.

• IPsec: The Encapsulating Security Payloads (ESP) protocol is the component of IPsec used for confidentiality as the whole packet payload is encrypted. It operates at the IP layer and provides a secure connection between two hosts. In most cases, IPsec is done directly by the OS or kernel layer: applications are unaware if they are using IPsec or not. Thus, IPsec is often used to establish a secure canal between distant sites. Contrary to S/MIME, the protection is done at the network layer.

Thus, all the application data are protected (integrity and confidentiality) and not only the SIP payload.

• SIP over TLS: As for many other application protocols, it is possible to use TLS to authenticate and protect the messages from a MITM. It operates between the appli- cation layer and the transport layer and can run on both TCP and UDP transport protocol (DTLS), but is mainly used over TCP. The validation process of the cer- tificates is described in the RFC 5922 [16]. It provides confidentiality and integrity and works on a hop-by-hop basis. Most implementations use this solution.

2.2 Media session protocols

First, the audio or video flow encoding will be described. Then its transmission over the network will be detailed, as well as the parameters exchange and the transport protocols involved.

2.2.1 Codecs

Before thinking of sending the media flow through the network, a system first has to decide what it actually wants to send. The microphone or camera gives the system a digital signal that is quite large. Telephony works in real time so the signal needs to cross the network as quickly as possible. To help speed up transmission, mathematical coders- decoders called codecs were built to encode a signal for transmission and then decode it for viewing. The objective of these codecs is to reduce as much as possible the size of the signal while keeping the quality as good as possible with a computational complexity low. This problematic is major for service providers and cloud companies as they need to satisfy a maximal number of users with the less resources. Thus they finance a lot of research in this area to find better codecs.

With VoIP, the systems have to be sure that the endpoints of their media flow have the decoders to read it. Before sending the media flow they have to agree on the codec to use. This capacity is given with the SDP protocol (described in Section 2.2.5) during the signaling part of the call. Today, VoIP phones understand several codecs, to have at least one in common with the other side, and to be able to choose the best codec depending on the bandwidth available and the desired quality.

Table 4 shows the main audio codec used with their characteristics. The G.711 codec

is very old and is the basic codec for most implementations. Even if two phones are very

different, a user can almost be sure that they will both support G.711. With the increase

(22)

of network transmission, data storage and video calls, several other interesting codecs have been developed. Some are free and can be used by anyone in his applications.

Codec Year BW (kHz) BR (kbps) Sampling

(kHz) Patents Comment

G.711 1972 3.3 64 8 Not anymore Reference codec

G.722 1988 7 48, 56, 64 7 Not anymore High quality

G.722.1 1999 7 24, 32 16 Royalty-free Lower complexity

G.722.2 2003 7 6.60 to 23.85 16 Yes Lower bitrate

G.7223.1 1995 3.3 5.3 and 6.3 8 Yes Required in H.323

AMR 1999 3.2 4.75 to 12.2 8 Yes Standard speech codec GSM/UMTS

G.726 1990 3.3 16, 24, 32, 40 8 Yes Used in international trunks

G.729 1995 3.3 8 8 Yes Low bitrate

G.719 2008 20 32 to 128 48 Yes High quality, flexible rate selection iLBC 2004 3.3 13.33, 15.2 8 Open format Part of WebRTC project

Opus 2012 4 to 20 6 to 510 8 to 48 Open format Good quality with low complexity

Table 3: Main audio codecs 2.2.2 RTP

The VoIP system will encode the signal using one of the codec above to obtain the data it is willing to send to the other party. However, sending it directly over UDP will be too uncertain as there is no information at all about the packets sent. If there are some loss or delay, the end user won’t be able to rebuild the flow. This is why VoIP needs an intermediary protocol to carry the media stream. The Real-time Transport Protocol (RTP) defined in the RFC 3550 [1] is a network protocol for delivering audio and video over the network with facilities for jitter compensation and detection of out of sequence arrival in data. RTP works on top of UDP (and not TCP) because systems can tolerate some loss in the packets in favor of delay.

A RTP packet is composed of a header and a payload to carry the data. The RTP header is shown in Figure 7. The headers have the following meanings:

• V: The version of the protocol, currently 2

• P: Indicates if there are padding bytes at the end of the RTP packet

• X: Indicates the presence of an extension header

• CC: Contains the number of contributing sources (CSRC) identifiers

• M: Marker for the application level

• PT: Indicates the RTP payload type

• Sequence number: Number to determine packet sequence and detect packet loss

• Timestamp: Indicates the Internet media type in the message body

• SSRC: Identifies the source of the stream, randomly chosen

• CSRC: List of contributing sources for a stream that has been generated from

multiple sources

(23)

• Extension header: Possibility to add new header fields

V P X CC M PT Sequence number

Timestamp SSRC identifiers CSRC identifiers

...

Extension header ...

0 2 3 4 8 9 16 31

Figure 7: RTP header as defined in RFC 3550 [1]

The encoded media signal is placed into the payload of the RTP packets. The sequence number ensures that the packets can be reordered and the timestamp ensures that a loss does not perturb the reconstitution of the media stream. The protocol is independent of the media it carries, that means that a new multimedia format can be transported with RTP just by defining a new RTP payload type. There is no security at all in RTP, everyone can see the flow and modify it. This is why a new protocol, SRTP, has been built based on RTP to provide the security services needed.

2.2.3 SRTP

The Secure Real-time Transport Protocol (SRTP) [17] is used to secure the exchanges made with RTP. The header is in clear but the payload is encrypted. With a key exchange protocol, the two parts exchange a key from which will be derived the necessary keys for the RTP and RTCP sessions. The RFC does not specify a key exchange protocol and several have been defined for this use:

• ZRTP (RFC 6189 [18]): A shared secret and the security parameters are exchanged using Diffie–Hellman key exchange. Then each user has a Short Authentication String (SAS) that it can share to the other, orally, to check if they have the same and no MITM has altered the key establishment. The ZRTP exchange uses the same port as the RTP traffic that will follow.

• MIKEY (RFC 6189 [19]): This is a key management protocol designed to per- form key exchange and negotiate cryptographic parameters on behalf of multimedia applications. The MIKEY messages are transported inside the SDP part of the signaling exchanges, encoded in base64. Thus, it relies on the security of SIP (with TLS or IPsec) to provide security for RTP.

• DTLS-SRTP (RFC 6189 [20]): It used DTLS to exchange keys for the SRTP media

transport. It is a very secure way to process because it does not rely on the security

of SIP and reuse the TLS protocol. But this requires a PKI contrary to ZRTP.

(24)

• SDES (RFC 6189 [21]): As for MIKEY, security parameters and keys are sent as an SDP attribute of the initial SIP messages.

Even though several mechanisms exist and have been implemented, SDES is the most common way of doing SRTP today because of its simplicity. However, the security it provides is based on the one provided for SIP. As the keys are sent in the SDP message, SIP needs to be encrypted and authenticated. This is the weakness of SDES, and the reason why one should prefer DTLS-SRTP for example.

2.2.4 RTCP

The RTP Control Protocol (RTCP) is an associated protocol of RTP. It has been defined in RFC 3550 [1] and is primarily used to provide feedback on the quality of the data distribution. Mainly, RTCP aims:

• To monitor the quality of services with information such as the packet counts, packet loss, delay variation or round-trip delay time.

• To carry a persistent transport-level identifier for an RTP source called the canonical name or CNAME. The SSRC identifiers might change and the CNAME helps to keep track of each participant. It can be used to associate multiple media streams (for example to synchronize audio and video).

• To keep track of the number of participants by looking at the RTCP packets received.

• To convey minimal session control information, for example participant identifica- tion to be displayed in the user interface.

RTP uses an even port number and RTCP uses the next odd port number. RTCP, as RTP, is independent of the underlying transport protocol. RTCP packets are transmitted periodically, at a recommended minimum interval of 5 seconds. Several RTCP packets can be combined in the same UDP datagram. RTCP has several types of messages:

• SR: Sender report for transmission and reception statistics from participants that are active senders (i.e. who send media to the others)

• RR: Receiver report, for reception statistics from participants that are not active senders

• SDES: Source description items, including CNAME

• BYE: Indicates end of participation

• APP: Application-specific functions

As RTP, RTCP has a secure version known as Secure RTCP (SRTCP), very similar to

SRTP. It provides the same security features to RTCP as the one provided by SRTP to

RTP.

(25)

2.2.5 SDP

To make the recipients know what the phone is sending with RTP, VoIP needs a protocol to share the media parameters and make the users agree on them. The Session Description Protocol (SDP) have been created for that purpose. It has been defined in RFC 4566 [22] and is a format for describing media sessions such as transport addresses, transport protocols, port numbers and media details. It works in conjunction with SIP and is sent in the body of the initial SIP messages. As SDP is just a format for session description, it can also be transported via SAP, RTSP and HTTP. The SDP part of the SIP message is indicated with the value application/sdp in the header field Content-Type.

The form of each field is structured as <type>=<value>. The RFC insists on the fact that there should be no whitespace on either side of the “=”. A SDP session description includes the following media information:

• The type of media (video, audio...)

• The transport protocol (RTP/UDP/IP, H.320...)

• The format of the media (H.264 video, G.722 audio...)

• The remote address for media

• The remote transport port for media

A SDP description is shown below in Figure 8. It can be split into three parts: the session description, the time description and the media description.

v=0

o=MxSIP 0 1 IN IP4 192.168.51.38 s=SIP Call

c=IN IP4 192.168.51.38 t=0 0

m=audio 3000 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=rtpmap:18 G729/8000 a=rtpmap:110 PCMU/16000 a=rtpmap:111 PCMA/16000

Figure 8: Example of a SDP description

The SDP description is part of a SIP INVITE message to establish a call between two softphones. The meaning of the different types shown in the SDP message are:

• v: Protocol version number, currently 0

• o: Originator and session identifier. The format is:

<username> <sess-id> <sess-version> <nettype> <addrtype> <unicast-address>

• s: Session name

(26)

• c: Connection information for the media. The format is:

<nettype> <addrtype> <connection-address>

In figure 8, IN represents the Internet, IP4 means that the address used is an IPv4 address and 192.168.51.38 is the IP address the receiver has to send the media flow.

• t: Time the session is active. The format is <start-time> <stop-time> . 0 in both means that the session hasn’t started yet.

• m: Media name and transport address. The format is <media> <port> <proto> <fmt> .

• a: Media attributes line. The example is composed of the audio codecs proposed with their characteristics.

SIP has two different ways of sending the SDP and making the two sides agree on the parameters the call will use:

• Early offer: In the first way, the call originator makes the first offer and the call recipient chooses the parameters. The first INVITE contains the SDP offer and the called party chooses and responds in the following 200 OK message.

• Late offer: In the second way, the call recipient makes the first offer and the call originator chooses the parameters. The first INVITE contains no SDP part and the called party will make the offer in the 200 OK message. Then the calling party will choose and respond in the following ACK.

2.3 Other protocols

The protocols presented before are not the only ones crossing through a ToIP network.

First, there are some other VoIP protocols that systems can use. Then, the IP phones often offer services, like directories, using non VoIP protocols.

2.3.1 VoIP protocols

Some other protocols have been created for VoIP, some are proprietary and popular among general public (like Skype and Teamspeak) whereas others are free and imple- mented in some open source project (like IAX and Jingle).

Skype and TeamSpeak

Skype [23] is the most popular VoIP software in the world. A lot of people use it to make video calls or to call telephone numbers from the Internet. However, Skype uses its own protocol called Skype protocol which has not been made publicly available.

TeamSpeak [24] is another proprietary VoIP software used to speak on a chat channel

like for a telephonic conference call. It is mostly used by gamers for online multiplayer

games. As for Skype the protocol hasn’t been made public (and remains secret).

(27)

Jingle and IAX2

Jingle [25] is an extension for Jabber to add VoIP features. Jabber is an open protocol based on the XML language. It is decentralized and works in the same way as email services. It is very close to the protocol used by Google Talk and can also easily be converted to SIP with specific gateways. There are a few clients and IPBX (like Asterisk or FreeSWITCH) that can use Jingle.

The Inter-Asterisk eXchange 2 (IAX2) is a protocol that can be used for any type of streaming but is mainly designed to support audio calls. It has been designed by the creators of the IPBX Asterisk and has been published in RFC 5456 [26]. It is supported by Asterisk of course, but also by other IPBX and softphones. Compared to SIP, it minimizes the bandwidth used and provides a native way to traverse NAT. However, it is less flexible than SIP and new features have to be added in the protocol specification.

2.3.2 Service protocols

IP Phones, like the ones used in companies, are complex devices which offer a lot of features other than just VoIP. First, the whole boot process is quite complex and requires several steps. When the phones boot they use DHCP to acquire an IP address, then they might use NTP to acquire the time used on the network. Then most of the commercial phones download their configuration files using the Trivial File Transfert Protocol (TFTP), FTP, HTTP or HTTPS. Then, they might look to the upgrade server in case of software upgrade, again with TFTP, FTP or HTTP(S). All these operations might require the use of the DNS protocol. At this point they can proceed to the SIP registration and are ready to make and receive calls.

Secondly the phones can provide services and applications working remotely. For example, a lot of phones can use XML applications. These applications are running on a server that receives HTTP(S) requests from the phones and responds with files using the XML format. It can be used for directories, calendars, speed dials, notes...

2.4 Unified Communications

Unified Communication is the new industry term used to describe the set of com- munication services intended for companies. It includes several products, with the aim to integrate them into one single software and interface that will manage everything, to increase efficiency and ergonomy. This interface should be accessible from every devices a company may have (smartphones, computers, tablets...). The main components are:

• Calendar: To manage meetings and deadlines

• Messaging: To send emails, SMS, fax, voicemail

• Telephony: To make audio and video calls directly with a physical phone or from the computer

• Conferencing: To schedule meetings, check the availability of the guests, make the conference online with audio and video flows from everyone

• Presence: To know the presence state of a colleague (idle, in call, in meeting...)

(28)

• Collaboration: To work on a file from everywhere with several people at the same time

• Data: To store the files online and access them from everywhere

This is a new business growing fast and several companies are working on solutions.

VoIP occupies a key place in Unified Communication. It is still one of the major media of

communications used in companies and the flexibility of SIP allows the implementation

of new features that will be integrated into the whole Unified Communication system.

(29)

3 Security on a ToIP infrastructure

Since the telephony has moved on IP networks, the attack surface has increased be- cause of its reachability. Before VoIP, when people were using analogical systems, the attack were mainly physical. Someone could have access to the wired line and install copper straps to intercept the signal and listen to someone. This interception could alter a little the quality of the call by adding some poor contacts which will lead to crackling and even interference. However, to do this procedure the person needed a physical access and the security measures to prevent it were pretty clear. There was also no authenti- cation and users could not be sure of the identity of the person who was calling them.

Today, since ISDN and now SIP, all the devices are reachable from anywhere and they have evolved, they are computer with all the common security issues involved. The traffic passes in the same cable as the others services and users do not have the control of all the devices on the path. As for emails, the telephony infrastructure is a very critical element for companies. If someone gained access to a device on the network, he could make premium calls resulting to a huge bill for the company. No one is protected and every company should be careful about it. One of the conceivable security step is to add a specific device at the border of the ToIP network that will have the same role as firewall or reverse proxy for the traditional Internet services. Such devices are called Session Border Controllers (SBC). They will be presented in Section 3.2 after having a look at security in ToIP in Section 3.1.

3.1 Security issues in ToIP

In the first part of this section, the security risks of ToIP will be showed along with the related common attacks. Then, how to make ToIP as secure as possible will be discussed.

Finally the limits of these actions will be explained.

3.1.1 Risks Unavailability

An attacker could try to make the whole system unavailable. It can be done by finding bugs in devices, or more simple, by doing a DDoS attack on an infrastructure making the servers flooded by all these packets. The consequences will first be economical for the company. If the employees cannot make call anymore it will lower there productivity and even prevent them to work. It can also be very harmful with emergency calls like for police officers and firefighters.

Corruption

If an attacker has a MITM position and has access to the media flow he could insert or

delete audio parts in the call. It can be used to give false information to someone. Indeed,

the target will think these information are true because part of a certified conversation

between him and the other. For example, a program of an attacker could change the

answering machine message of a bank, asking for the user account password after the

(30)

identification process of the beginning. The user is sure he is talking with the bank answering machine so he won’t think of an attack here.

Non-confidentiality

This is one of the major risks. If someone can listen to all of the conversations of a company because they are not using SRTP, it could compromise the whole company. It can also be used to steal user names, passwords and phone numbers. In the early days of VoIP systems, no one was using SIPS and SRTP to secure the communication and it made ToIP an attractive target for hackers.

Impersonation

An attacker could try to make calls pretending to be someone else. Indeed, if the recipient does not know the voice of the person, he will look at the name or number written on the phone to identify the person calling. The main goal here is to ask for confidential information or to make the recipient do a task for him.

Toll fraud

This is the major issue for small companies which do not have secret information an attacker might want to steal. The CFCA 2015 report about fraud loss [27] shows that the surveyed companies have lost more than 15 billion dollars with toll fraud. It can be done with international calls or premium numbers calls. A classical way to do it is to have access on a phone or the IPBX and use it to make automatic calls.

Harassing calls and SPIT

It is easier to generate automatic calls with VoIP than with traditional telephony.

Everyone with a computer and a SIP trunk account can make or use a program to generate calls to random numbers. These calls can play an advertisement, they are called Spam over Internet Telephony (SPIT) in this case, or harass someone. This problem is not yet comparable with the email spam, however the increase of VoIP may induce the increase of such behaviors. Researchers have been working on the SPAM detection, based on call frequency, low call completion and low call duration average. Their works have been published in the RFC 5039 [28].

Propagation

If an attacker gains access to any device of a ToIP infrastructure, it could be the entry

door for all the IT infrastructure. From the server or the phone he controls, he could

try to reach computers or other servers on the network. A company could have a perfect

security on the computer side of the infrastructure, it can be totally ruined if they do not

apply the same security level in the ToIP infrastructure.

(31)

3.1.2 Common ToIP attacks

When an attacker wants to attack a ToIP infrastructure it has several known tech- niques to do it [29]. I will suppose that the basic protection with SIP digests is in place.

Otherwise, anyone can pretend to be someone else intercepting every calls and making all the calls he wants.

Banner grabbing

At the beginning of the attack, the attacker may want to gather information about the targeted system, like the software running on the servers and phones. Using this information with a Common Vulnerabilities and Exposures (CVE) can lead to security issues. The SIP messages follow the model of the HTTP messages so there is a user-agent header and a server header which carry the software used and its version. In addition, the SDP part has also a User-Agent field. Their utility can be discussed because the SIP protocol is an open standard so, in theory, it’s not useful to know the recipient’s software to be able to communicate with him. An attacker could simply know the software by sending an INVITE message and looking at the fields in the response.

User enumeration

Then the attacker may want to list the users of the service, to impersonate them later, or to call them for SPIT. To find them, he can try to send REGISTER, INVITE or OPTIONS messages and look at the responses. The response will be different, regarding if he asks for a regular user or an unknown user. It can tell him if a user exists or not.

Password cracking

When the attacker has found some usernames, he can try to find their associated pass- word. He can do it by performing a brute-force attack (trying all the password possibili- ties) or better a dictionary attack (trying only the most likely). He will send REGISTER messages, wait for the authentication query, and try a password in the response. If the password was not the right one, he will receive another authentication query. Otherwise, if the password was good, he will receive a 200 OK message. Once he found a good password, he can use it to impersonate the user or to do toll fraud.

DoS

The most famous attack to make a system unavailable is a DoS attack. It consists on flooding the services in order to make them unavailable. This attack can saturate the network links (volumetric attack) or make the server CPU reach 100%. The system will be unavailable for the legitimate users who will have their requests lost in the massive amount of flooding requests as shown below in Figure 9. There are several flooding techniques at every layer.

The protection (excepted for the volumetric attacks) is to make a selection in the

received packets in order that the server does not process every packet and keeps its

(32)

resources available. However, finding a way to do it efficiently is a very difficult problem and there are active researchers on this subject because most of the hosting providers want to provide such protection. For the VoIP services, the most common way is to send a lot of INVITE messages. The server will process the message and it will consume resources, affecting its performances [30].

Attacker

INVITE

millions of INVITE messages

INVITE

Figure 9: Example of DoS attack with INVITE messages

Illegitimate BYE

Another way to block VoIP for a domain is to send illegitimate BYE to close the calls.

Indeed, if no security protocol have been used for SIP, there is no authentication between the UAC Even if the messages pass through a proxy which authenticates the sender with a digest. A UAC will accept all messages on the condition of having the right identifiers for the dialogue: the two tags and the call-id. Thus a MITM could want to stop the conversation by sending a BYE with the corresponding fields to both side of the call. It can be done to prevent someone or a company to make calls during a period of time.

Fuzzing

If an attacker finds a message that can crash the SIP stack of the devices, it will instantly stop all the communications. The literature [31] proposes five categories of malformed packets:

• Incorrect Grammar

• Oversized Field Values

• Invalid Message or Field Name

• Redundant or Repetitive Header Field

(33)

• Invalid Semantic

The main goal of this attack is to crash the system, or exploit a software flaw such as a buffer overflow, to execute arbitrary codes.

Amplification

The SIP protocol has a vulnerability named “loop amplification attack”, shown in RFC 5393 [2]. It can be leveraged to cause of small number of SIP requests to generate an extremely large number (up to 2 ⁷¹ ) of messages. To achieve it, an attacker needs two registrar services and two addresses in each one. It limits the possible applications of this attack. More details are given in Appendix A.

Steganography

Once an attacker took control of one device, he might want to transfer data outside, like calls logs, voice samples or documents from another computer of the company. To do it without being detected, he could want to hide it inside the RTP flow of a call. Calls in VoIP can use a quite high stable bandwidth that could allow an attacker to transmit data. Several known techniques have been discovered by researchers:

• The Least Significant Bit (LSB) technique is a common way to do steganography.

It consists of hiding data in the bits with the least importance so that the difference is almost unnoticeable for a listener.

• The redundancy bits technique is similar to the previous one. Most codecs include redundancy bits that can be used to check on the data received. Hiding data in these bits is possible. If the transmission medium is good, the flow will fail the redundancy check at the recipient side but the audio will still be the same after the decoding phase.

• The RTP extension header uses the possibility of adding RTP headers to hide the data inside it.

• The LACK (Lost Audio Packets Steganography) technique hides the information into delay packets. If the packets have enough delay they will not be interpreted for the audio restitution but still received by the recipient. These packets still belong to the same media flow and to the conversation. Thus, they can pass some surveillance mechanism. At the end, the recipient can reconstitute the original data with all the delayed packets.

These methods and others have been previously described in literature [32]. For example, a one hour call using G.711 audio codec consumes more than 200Mb of data. Using the LSB, an attacker can pass dozens of megabits of hidden data.

3.1.3 How to make secure ToIP

The risks for a ToIP architecture have been described in the previous part. A few of

them were already present in the analogical telephony network but, again, the surface of

(34)

attack has increased. To prevent these attacks to happen, several measures have to be taken on both network level and system level.

Network Level

VoIP runs in top of the traditional IP network, so known protocols can be applied to make the layers 1 to 4 secure. First, at the physical level, the equipments have to be protected to prevent someone to get an access to it. That means that a company should not put phones in public area of its buildings. They must have a good building security to prevent a stranger to have access to some offices with VoIP systems. If someone can have access to a phone he could modify the firmware to turn the phone into a spying tool.

He could also make calls to premium numbers to make some dollars or make calls using someone phone to spoof his identity. Then, the ToIP network and the computers network have to be separated to reduce the risks of propagation. It can be done physically or logically with the use of separate VLANs. Using VLAN is easier and most companies use this solution to separate ToIP. However, it is possible to jump from the computer VLAN to the ToIP VLAN [33], making such protection not as secure as a physical separation. With a dedicated network the computers of the company won’t be reachable with the phones and reciprocally. However, such an approach might be quite difficult to apply with the Unified Communication systems used by some companies. Another difficulty with this approach is of course the cost of it. Some companies won’t pay the infrastructure cost to increase security.

Then at the layer 2, 802.1X is necessary with mutual authentication such as EAP- TLS. It will prevent someone, who does not have a correct signed certificate, to plug his own device into the network to infect other systems. Anti ARP-spoofing techniques must also be applied to prevent a user on the network to send spoofed ARP packets to receive the RTP flow intended to someone else. This can simply be done with the use of static ARP tables but this does not scale to a large network. On the system OS, unsolicited ARP replies that might come from an attacker can just be ignored. The switches could also implement protection, based on the knowledge of the network given by the DHCP requests/responses.

At the layer 3 IPsec can be used in some particular cases. For example if a company has several offices in different cities, they could use gateway in front of each network with IPSec to make secure tunnels between the places.

Then in the application level, SIP over TLS (SIPS) and SRTP/SRTCP are necessary.

SIPS assures confidentiality, integrity and authentication for the SIP messages. If possible,

the use of the dual authentication in TLS adds also authentication for the client. SRTP

prevents eavesdropping, modification and replay of packets for the media flows. These

two protocols are mandatory for a minimum security policy in ToIP systems. The use of

SRTCP is less critical than SRTP but it provides confidentiality, integrity and authentica-

tion for the RTCP packets. In addition, if the phones use online XML services, they must

do it with HTTPS and dual authentication (with TLS or just with digest authentication

on TLS). Figure 10 summarize the protocols and techniques presented.

(35)

Dedicated network, building security Layer 1

802.1X, Anti DHCP-spoofing, Anti ARP-spoofing Layer 2

IPsec Layer 3

Layer 5 TLS

HTTPS, SIPS, SRTP Layer 7

Figure 10: Security measures at the network level

System Level

Once the secure versions of the VoIP protocols are in place, the system has to be protected against attacks that use the correct behavior of the secure protocols. The sys- tem administration best practices have to be followed. First, secure passwords (minimal number of characters, use of uppercase letters and symbols) must be used to prevent someone from guessing it. Then, the systems must be upgraded to their last version to prevent someone to use known vulnerabilities that have been fixed since. In the infras- tructure, a backup system must be in place to prevent single points of failure. It can be a redundancy of the IPBX and SIP servers. Then system hardening must be done to reduce the surface of vulnerability. Here, it will consist of disabling the services that are not used and revoking the credentials that are not needed. The configuration must be done very carefully on the servers because a bad configuration can induce vulnerabilities.

Finally, the infrastructure has to be monitored to detect intrusion or strange behaviors.

The network throughput can be watched for the whole system and per host, as well as the state of the devices, the bills for the month or the number of calls made. Detecting an attack as early as possible will reduce its impact.

3.1.4 Limits

However, even if most of the suggestions above have been applied to the ToIP infras- tructure, some attackers could pass through these protections. It can be done because of human factors or because of vulnerabilities in the protocols or in the running systems.

Several scenarios are presented below to examine what could be done to prevent them to happen.

IP Phones vulnerabilities

In the worst cases, the IP phones may be the target of attacks to gain a root access on

them or to change their firmware. Once this has been done, the attacker can do anything

he wants with the phone. The system administration best practices described above make

such attacks more difficult but there are always risks. For example, in 2013, it has been

discovered [34] that some phones could be turned into spying microphones. This attack

A reverse proxy for VoIP: Or how to improve security in a ToIP network

IN

DEGREE PROJECT ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2016 ,

A reverse proxy for VoIP

Or how to improve security in a ToIP network GUILLAUME DHAINAUT

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING

KTH Royal Institute of Technology

Master’s Programme in Network Services and Systems - ANSSI

Guillaume Dhainaut 921006-5950

dhainaut@kth.se

A reverse proxy for VoIP

Or how to improve security in a ToIP network

Master’s Thesis

Stockholm, February 24, 2016

Supervisors Pierre Lorinquer ANSSI

Fabien Allard ANSSI

Examiner Panagiotis Papadimitratos (papadim@kth.se) KTH

Abstract

Sammanfattning

Denna avhandling presenterar orsakerna till SBC existens, baserat på de säkerhets

svagheter en ToIP nätverk kan visa. Dessa skäl används sedan för att upprätta en förteck-

ning över egenskaper som kan förväntas av en SBC och diskutera dess ideal placering i

en ToIP nätverksarkitektur . En testmetodik för SBC är etablerad och används på fri

programvara Kamailio som en illustration. Efter detta test, förbättringar av denna pro-

gramvara, om hot förebyggande och attacker upptäcka, presenteras och genomförs.

Acknowledgements

This research was supported by ANSSI (French Network and Information Security Agency) in Paris. They offered me helps and materials. I owe my gratitude to Pierre Loriquer and Fabien Allard which were my ANSSI supervisors and helped me throughout this thesis by giving me advice and idea.

I would also like to thank Colin Chaigneau and Valentin Houchouas, my colleagues at ANSSI, for their helps in areas of which they were experts.

I am thankful to Panagiotis Papadimitratos, my supervisor at KTH, for his helpful advice about the report.

Finally I thank Alice Tourtier for her help and support.

Contents

1 Introduction 1

1.1 Goal of the thesis . . . . 1

1.2 Contribution . . . . 2

1.3 Outline . . . . 2

2 VoIP Technologies 3 2.1 SIP . . . . 4

2.1.1 Messages . . . . 4

2.1.2 Elements of a SIP call establishment . . . . 6

2.1.3 Exchange examples . . . . 8

2.1.4 Security Mechanisms . . . . 9

2.2 Media session protocols . . . . 11

2.2.1 Codecs . . . . 11

2.2.2 RTP . . . . 12

2.2.3 SRTP . . . . 13

2.2.4 RTCP . . . . 14

2.2.5 SDP . . . . 15

2.3 Other protocols . . . . 16

2.3.1 VoIP protocols . . . . 16

2.3.2 Service protocols . . . . 17

2.4 Unified Communications . . . . 17

3 Security on a ToIP infrastructure 19 3.1 Security issues in ToIP . . . . 19

3.1.1 Risks . . . . 19

3.1.2 Common ToIP attacks . . . . 21

3.1.3 How to make secure ToIP . . . . 23

3.1.4 Limits . . . . 25

3.2 Session Border Controller . . . . 27

3.2.1 Principle . . . . 27

3.2.2 Differences with existing devices . . . . 29

4 SBC features 30 4.1 Internetworking . . . . 30

4.2 Media . . . . 30

4.3 QoS . . . . 31

4.4 Security . . . . 32

4.4.1 Upstream protection . . . . 32

4.4.2 Common attack protection . . . . 34

4.4.3 Downstream protection . . . . 36

5 SBC Architecture integration 39 5.1 At the border of ToIP architecture . . . . 39

5.1.1 Presentation . . . . 39

5.1.2 Limits . . . . 40

5.2 At the Center of ToIP architecture . . . . 40

5.2.1 Presentation . . . . 40

5.2.2 Limits . . . . 40

5.3 Combination . . . . 41

6 SBC Security Test 42 6.1 Methodology . . . . 42

6.1.1 Test environment . . . . 42

6.1.2 Features announced . . . . 42

6.1.3 Fuzzing . . . . 43

6.1.4 Dos/Flood . . . . 43

6.1.5 TLS quality . . . . 44