End to end VoIP security


User communications applications are in high demand in the Internet user community. Two classes of such applications are of great importance and attract interest by many Internet users: collaboration systems and VoIP communication systems. In the first category reside systems like ICQ , MSN Messenger and Yahoo! Messenger while in the latter, systems like Skype and VoipBuster are dominating among the public VoIP clients. In the architecture plane, collaboration systems form a distributed network where the participants communicate with each other and exchange information. The data are either routed from the source through a central server to the recipient or the two clients communicate directly. The participants in such networks are both content providers and content requestors . On the other hand, the data communication path in the VoIP systems is direct between the peers, without any involvement of the service network in the data exchange path with some exceptions like Skype's “supernode” communications. Data are carried over public Internet infrastructures like Ethernets, WiFi hotspots or wireless ad hoc networks. Security in these networks is a critical issue addressed in several different perspectives in the past.

In this assignment I focus on cryptographic security implementation in VoIP. Security is implemented dynamically in cooperation by the two (or more) peers with no prior arrangements and requirements, like out of band exchanged keys, shared secrets etc. Ease of use (simplicity), user friendliness (no special knowledge from the user side) and effectiveness (ensuring confidentiality and integrity of the applications) combined with minimal requirements on end user devices are the goals achieved by our approach. We leverage security of user communications, meeting all the above requirements, by enhancing the applications architecture with VoIPSec security elements.

Over the past few years, Voice over IP (VoIP) has become an attractive alternative to more traditional forms of telephony. Naturally, with its in-creasing popularity in daily communications, re-searchers are continually exploring ways to improve both the efficiency and security of this new communication technology. Unfortunately, while it is well understood that VoIP packets must be encrypted to ensure confidentiality, it has been shown that simply encrypting packets may not be sufficient from a privacy standpoint. For instance, we recently showed that when VoIP packets are first compressed with variable bit rate (VBR) encoding schemes to save bandwidth, and then encrypted with a length preserving stream cipher to ensure confidentiality, it is possible to determine the language spoken in the encrypted conversation.

As surprising as these findings may be, one might argue that learning the language of the speaker (e.g., Arabic) only affects privacy in a marginal way. If both endpoints of a VoIP call are known (for example, Mexico City and Madrid), then one might correctly conclude that the language of the conversation is Spanish, without performing any analysis of the traffic. In this work we show that the information leaked from the combination of using VBR and length preserving encryption is indeed far worse than previously thought.


This assignment is about security, more specifically, about protecting one of your most precious assets, your privacy. We guard nothing more closely than our words. One of the most important decisions we make every day is what we will say and what we won't. But even then it's not only what we say, but also what someone else hears, and who that person is.

Voice over IP- the transmission of voice over traditional packet-switched IP networks—is one of the hottest trends in telecommunications. Although most computers can provide VoIP and many offer VoIP applications, the term “voice over IP” is typically associated with equipment that lets users dial telephone numbers and communicate with parties on the other end who have a VoIP system or a traditional analog telephone. (The sidebar, “Current voice-over-IP products,” de-scribes some of the products on the market today.)

As with any new technology, VoIP introduces both opportunities and problems. It offers lower cost and greater flexibility for an enterprise but presents significant security challenges. Security administrators might assume that because digitized voice travels in packets, they can simply plug VoIP components into their already se-cured networks and get a stable and secure voice net-work. Unfortunately, many of the tools used to safeguard today's computer networks—firewalls, network address translation (NAT), and encryption—don't work “as is” in a VoIP network. Although most VoIP components have counterparts in data networks, VoIP's performance demands mean you must supplement ordinary network software and hardware with special VoIP components.

Integrating a VoIP system into an already congested or overburdened network can be disastrous for a company's technology infra-structure. Anyone at- tempting to construct a VoIP network should therefore first study the procedure in great detail. To this end, we've outlined some of the challenges of introducing appropriate security measures for VoIP in an enterprise.

End-to-End Security

IN this assignment I am going to describe the end-to-end security and its “design principle” that one should not place mechanisms in the network if they can be placed in end nodes; thus, networks should provide general services rather than services that are designed to support specific applications. The design and implementation of the Internet followed this design principle well. The Internet was designed to be an application-agnostic datagram de-livery service. The Internet of today isn't as pure an implementation of the end-to-end design principle as it once was, but it's enough of one that the collateral effects of the network not knowing what's running over it are becoming major problems, at least in the minds of some observers. Before I get to those perceived problems, I'd like to talk about what the end-to-end design principle has meant to the Internet, technical evolution, and society. The Internet doesn't care what you do—its job is just to “deliver the bits, stupid” (in the words of David Isenberg in his 1997 paper, “Rise of the Stupid Network”2). The “bits” could be part of an email message, a data file, a photograph, or a video, or they could be part of a denial-of-service attack, a malicious worm, a break-in attempt, or an illegally shared song. The Net doesn't care, and that is both its power and its threat.

The Internet (and by this, I mean the Arpanet, the NSFNet, and the networks of their successor commercial ISPs) wasn't designed to run the World Wide Web. The Internet wasn't designed to run Google Earth. It was designed to support them even though they did not exist at the time the foundations of the Net were designed. It was designed to support them by being designed to transport data without caring what it was that data represented.

At the very first, the design of TCP/IP wasn't so flexible. The initial design had TCP and IP within a single protocol, one that would only deliver data reliably to a destination. But it was realized that not all applications were best served by a protocol that could only deliver reliable data streams. In particular, timely delivery of information is more important than reliable delivery when

trying to support interactive voice over a network if adding reliability would, as it does, increase delay. TCP was split from IP so that the application running in an end node could determine for itself the level of reliability it needed. This split created the flexibility that is currently being used to deliver Skype's interactive voice service over the same network that CNN uses to deliver up-to-the-minute news headlines and the US Patent and Trademark office uses to deliver copies of US patents.

Thus the Internet design, based as it was on the end-to-end principle, became a generative facility. Unlike the traditional phone system, in which most new applications must be installed in the phone switches deep in the phone net-work, anyone could create new applications and run them over the Internet without getting permission from the organizations that run the parts of the Net. This ability was exploited with “irrational exuberance”4 during the late 1990s Internet boom. But, in spite of the hundreds of billions of dollars lost by investors when the boom busted, the number of Internet users and Web sites, the amount of Internet traffic, and the value of Internet commerce have continued to rise, and the rate of new ideas for Internet-based services hasn't no- ticeably diminished.

Security and privacy in an end-to-end world

The end to end arguments paper used “se-cure transmission of data” as one reason that an end-to-end design was required. The paper points out that network-level or per-link encryption doesn't actually provide assurance that a file that arrives at a destination is the same as the file that was sent or that the data went unobserved along the path from the source to the destination. The only way to ensure end-to-end data integrity and confidentiality is to use end-to-end encryption.

Thus, security and privacy are the responsibilities of the end nodes. If you want to ensure that a file will be transferred without any corruption, your data-transfer application had better include an integrity check, and if you didn't want to allow anyone along the way to see the data itself, your application had better encrypt it before transmitting it.

There are more aspects to security on a network than just data encryption. For example, to ensure that communication over the net-work is reliable, the network itself needs to be secure against attempts—purposeful or accidental—to disrupt its operation or redirect traffic away from its intended path. But the original Internet design didn't include protections against such attacks. Even if the network is working perfectly, you need to actually be talking to the server or person you think you are. But the Internet doesn't pro-vide a way, at the network level, to assure the identities of its users or nodes. You also need to be sure that the message your computer re receives isn't designed to exploit weaknesses in its software (such as worms or viruses) or in the ways

that you use the Net. Protection against such things is the end system's responsibility.

Note that there is little that can be done “in the Net” or in your end system to protect your privacy from threats such as the government demanding the records of your use of Net-based services such as Google, which collect information about your network usage.

Many of today's observers assume that the lack of built-in protections against attacks and the lack of a se-cure way to identify users or nodes was a result of an environment of trust that prevailed when the original Internet design and protocols were developed. If you trusted the people on the Net, there was no need for special defensive functions. But a few people who were “at the scene” have told me that such protections were actively discouraged by the primary sponsor of the early Internet—that is to say, the US military wasn't all that interested in having good nonmilitary security, maybe because it might make its job harder in the future. Whatever the reason, the Internet wasn't designed to provide a secure environment that included protection against the malicious actions of those who would disrupt it or attack nodes or services provided over it.

End-to-end security is not dead yet, but it is seriously threatened, at least at the network layer. NATs and firewalls interfere with some types of end-to-end encryption technology. ISPs could soon be required by regulations to, by default, filter the Web sites and perhaps the protocols that their customers can access. Other ISPs want to be able to limit the protocols that their customers can access so that the ISP can give service providers an “incentive” to pay for the customer's use of their lines—they don't see a way to pay for the net-work without this ability. The FBI has asked that it be able to review all new Internet services for tapability before they're deployed, and the FCC has hinted that it will support the request

If this were to happen, applications such as Skype that use end-to-end encryption could be outlawed as inconsistent with law enforcement needs.

Today, it's still easy to use end-to-end encryption as long as it's HTTPS, but that might be short-lived. It could soon reach the point that the use of end-to-end encryption, without which end-to-end security can't exist, will be seen as “an antisocial act” (as a US justice department official once told me). If that comes to be the case, end-toend security will be truly dead, and we will all have to trust functions in the network that we have no way of knowing are on our side.

What is VoIP end to end security?

Achieving end-to-end security in a voice-over-IP (VoIP) session is a challenging task. VoIP session establishment involves a jumble of different protocols, all of which must inter-operate correctly and securely. Our objective in this paper is to present a structured analysis of protocol inter-operation in the VoIP stack, and to demonstrate how even a subtle mismatch between the assumptions made by a protocol at one layer about the protocol at another layer can lead to catastrophic security breaches, including complete

removal of transport-layer encryption.

The VoIP protocol stack is shown in figure 1. For the purposes of our analysis, we will divide it into four layers: signaling, session description, key exchange and secure media (data) transport. This division is quite natural, since each layer is typically implemented by a separate protocol. Signaling is an application-layer (from the viewpoint of the underlying communication network) control mechanism used for creating, modifying and terminating VoIP sessions with one or more participants. Signaling protocols include Session Initiation Protocol (SIP) [27], H.323 and MGCP. Session description protocols such as SDP [20] are used for initiating multimedia and other sessions, and often include key exchange as a sub-protocol.

Key exchange protocols are intended to provide a cryptographically secure way of establishing secret session keys between two or more participants in an untrusted environment. This is the fundamental building block in se-cure session establishment. Security of the media transport layer—the layer in which the actual voice datagrams are transmitted—depends on the secrecy of session keys and authentication of session participants. Since the established key is typically used in a symmetric encryption scheme, key secrecy requires that nobody other than the legitimate session participants be able to distinguish it from a random bit-string. Authentication requires that, after the key exchange protocol successfully completes, the participants' respective views of sent and received messages must match (e.g., see the notion of “matching conversations” in [8]). Key ex-change protocols for VoIP sessions include SDP's Security DEscriptions for Media Streams (SDES) , Multimedia Internet KEYing (MIKEY) and ZRTP [31]. We will analyze all three in this paper.

Secure media transport aims to provide confidentiality, message authentication and integrity, and replay protection to the media (data) stream. In the case of VoIP, this stream typically carries voice datagrams. Confidentiality means that the data under encryption is indistinguishable from random for anyone who does not have the key. Message authentication implies that if Alice receives a datagram apparently sent by Bob, then it was indeed sent by Bob. Data integrity implies that any modification of the data in transit

We show how to cause the transport-layer SRTP protocol to repeat the keystream used for datagram encryption. This enables the attacker to obtain the xor of plaintext datagrams or even to completely decrypt them. The SRTP keystream is generated by using AES in a stream cipher-like mode. The AES key is generated by applying a pseudo-random function (PRF) to the session key. SRTP, however, does not add any session-specific randomness to the PRF seed. Instead, SRTP assumes that the key exchange protocol, executed as part of RTP session establishment, will en-sure that session keys never repeat. Unfortunately, S/MIME-protected SDES, which is one of the key ex-change protocols that may be executed prior to SRTP, does not provide any replay protection. As we show, a network-based attacker can replay an old SDES key establishment message, which will cause SRTP to re-peat the keystream that it used before, with devastating consequences. This attack is confirmed by our analysis of the libsrtp implementation.

We show an attack on the ZRTP key exchange protocol that allows the attacker to convince ZRTP session participants that they have lost their shared secret. ZID values, which are used by ZRTP participants to retrieve previously established shared secrets, are not authenticated as part of ZRTP. Therefore, an attacker can initiate a session with some party A under the guise of another party B, with whom A previously established a shared secret. As part of session establishment, A is supposed to verify that B knows their shared secret. If the attacker deliberately chooses values that cause verification to fail, A will decide—following ZRTP specification—that B has “forgotten” the shared secret.

The ZRTP specification explicitly says that the protocol may proceed even if the set of shared secrets is empty, in which case the attacker ends up sharing a key with A who thinks she shares this key with B. Even if the participants stop the protocol after losing their shared secrets, but are using VoIP devices without displays, they cannot confirm the computed key by voice and must stop communicating. In this case, the attack becomes a simple and effective denial of service. Our analysis of ZRTP is supported by the AVISPA formal analysis tool .

We show several minor weaknesses and potential vulnerabilities to denial of service in other protocols. We also observe that the key derived as the result of MIKEY key exchange cannot be used in a standard cryptographic proof of key exchange security (e.g., ). Key secrecy requires that the key be in-distinguishable from a random bitstring. In MIKEY, however, the joint Diffie-Hellman value derived as the result of the protocol is used directly as the key. Membership in many Diffie-Hellman groups is easily checkable, thus this value can be distinguished from a random bitstring. Moreover, even hashing the Diffie-Hellman value does not allow the formal proof of security to go through in this case, since the hash function does not take any random inputs apart from the Diffie-Hellman value and cannot be viewed as a randomness extractor in the proof. (This observation does not immediately lead to any attacks.)

While we demonstrate several real, exploitable vulnerabilities in VoIP security protocols, our main contribution is to highlight the importance of analyzing protocols in con-text rather than in isolation. Specifications of VoIP protocols tend to be a mixture of informal prose and pseudocode, with some assumptions—especially those about the protocols operating at the other layers of the VoIP stack—are left implicit and vague. Therefore, our study has important

lessons for the design and analysis of security protocols in general.

The rest of the paper is organized as follows. In section 2, we describe the protocols, focusing on SIP (signaling), SDES, ZRTP and MIKEY (key exchange), and SRTP (transport). In section 3, we describe the attacks and vulnerabilities that we discovered. Related work is in section 4, conclusions are in section 5.

VoIP security different from normal data network security

To understand why security for VoIP differs from data network security, we need to look at the unique constraints of transmitting voice over a packet network, as well as the characteristics shared by VoIP and data networks.

Packet networks depend on many configurable parameters: IP and MAC (physical) addresses of voice terminals and addresses of routers and firewalls. VoIP networks add specialized software, such as call managers, to place and route calls. Many network parameters are established dynamically each time a network component is restarted or when a VoIP telephone is restarted or added to the net-work. Because so many nodes in a VoIP network have dynamically configurable parameters, intruders have as wide an array of potentially vulnerable points to attack as they have with data networks. But VoIP systems have much stricter performance constraints than data networks, with significant implications for security.

Threats for VoIP

VoIP security threats contain Eavesdropping, Denial of Service, Session Hijacking, VoIP Spam, etc. For preventing these threats, there are several VoIP standard protocols. And we discuss this in Section 3.


VoIP service using internet technology is faced with an eavesdropping threat, in which is gathering call setting information and audio/voice communication contents illegally. Eavesdropping can be categorized largely by eavesdropping in a LAN(Local Area Network) environment, one in a WAN( Wide Area Network) environment, one through a PC(Personal Computer) hacking, etc.

Denial of Service

Denial of Service is an attack, which makes it difficult for legitimate users to take telecommunication service regularly. Also it is one of threats, which are not easy to solve the most. Since VoIP service is based on internet technology, it also is exposed to Denial of Service. Denial of Service in VoIP service can be largely divided into system resource exhaustion, circuit

This work was supported by the IT R&D program of MIC/IITA resourceexhaustion,VoIP communication interruption/blocking, etc.

Session Hijacking

Session Hijacking is an attack, which is gathering the communication session control between users through spoofing legitimate users, and is interfering in their communication, as a kind of man-in-the-middle attack. Session Hijacking in VoIP communication can be categorized largely by INVITE session hijacking, SIP Registration hijacking, etc.

VoIP Spam

VoIP Spam is an attack, which is interrupting, and violating user privacy through sending voice advertisement messages, and also makes VMS(Voice Mailing System) powerless. It can be categorized by Call Spam, IM(Instant Messaging) Spam, Presence Spam, etc.

Security trade-offs

Trade-offs between convenience and security are routine in software, and VoIP is no exception. Most, if not all, VoIP components use integrated Web servers for configuration. Web interfaces can be attractive, easy to use, and inexpensive to produce because of the wide availability of good development tools. Unfortunately, most Web development tools focus on features and ease of use, with less attention paid to the security of the applications they help produce. Some VoIP device Web applications have weak or no access control, script vulnerabilities, and inadequate parameter validation, resulting in privacy and DoS vulnerabilities. Some VoIP phone Web servers use only HTTP basic authentication, meaning servers send authentication information without encryption, letting anyone with network access obtain valid user IDs and passwords. As VoIP gains popularity, we'll inevitably see more administrative Web applications with exploitable errors.

The encryption process can be unfavorable to QoS

Unfortunately, several factors, including packet size expansion, ciphering latency, and a lack of QoS urgency in the cryptographic engine can cause an excessive amount of latency in VoIP packet delivery, leading to degraded voice quality.

The encryption process can be detrimental to QoS, making cryptodevices severe bottlenecks in a VoIP net-work. Encryption latency is introduced at two points. First, encryption and decryption take a nontrivial amount of time. VoIP's multitude of small packets exacerbates the encryption slowdown because most of the time consumed comes as overhead for each packet. One way to avoid this slowdown is to apply algorithms to the computationally simple encryption voice data before packetization. Although this improves throughput, the proprietary encryption algorithms used (fast Fourier-based encryption, chaos-bit encryption, and so on) aren't considered as secure as the Advanced Encryption Standard,16 which is included in many IPsec implementations. AES's combination of speed and security should handle the demanding needs of VoIP at both ends. following general guidelines, recognizing that practical considerations might require adjusting them:

• Put voice and data on logically separate networks. You should use different subnets with separate RFC 1918 address blocks for voice and data traffic and separate DHCP servers to ease the incorporation of intrusion-detection and VoIP firewall protection.

• At the voice gateway, which interfaces with the PSTN, disallow H.323, SIP, or Media Gateway Control Protocol (MGCP) connections from the data network. As with any other critical network management component, use strong authentication and access control on the voice gateway system.

• Choose a mechanism to allow VoIP traffic through firewalls. Various protocol dependent and independent solutions exist, including ALGs for VoIP protocols and session border controllers. Stateful packet filters can track a connection's state, denying packets that aren't part of a properly originated call.

  • Use IPsec or Secure Socket Shell (SSH) for all remote management and auditing access. If practical, avoid using remote management at all and do IP PBX access from a physically secure system.
  • Use IPsec tunneling when available instead of IPsec transport because tunneling masks the source and destination IP addresses, securing communications against rudimentary traffic analysis (that is, determining who's making the calls).
  • If performance is a problem, use encryption at the router or other gateway to allow IPsec tunneling. Be-cause some VoIP end points aren't computationally powerful enough to perform encryption, placing this

Recent studies indicate that the greatest contributor to the encryption bottleneck occurs at the cryptoengine scheduler, which often delays VoIP packets as it processes larger data packets.17 This problem stems from the fact that cryptoschedulers are usually first-in first-out (FIFO) queues, inadequate for supporting QoS requirements. If VoIP packets arrive at the encryption point when the queue already contains data packets, there's no way they can usurp the less time-urgent traffic. Some hardware manufacturers have proposed (and at least one has implemented) solutions for this, including QoS reordering of traffic just before it reaches the cryptoengine.18 But this solution assumes that the cryptoengine's output is fast enough to avoid saturating the queue. Ideally, you'd want the cryptoengine to dynamically sort incoming traffic and force data traffic to wait for it to finish processing the VoIP packets, even if these packets arrive later. However, this solution adds considerable overhead to a process most implementers like to keep as light as possible. Another option is to use hardware-implemented AES encryption, which can improve throughput significantly. Past the cryptoengine stage, the system can perform

further QoS scheduling on the encrypted packets, provided they were encrypted using ToS preservation, which copies the original ToS bits into the new IPsec header. Virtual private network (VPN) tunneling of VoIP has also become popular recently, but the congestion and bottlenecks associated with encryption suggest that it might not always be scalable. Although researchers are making great strides in this area, the hardware and soft-ware necessary to ensure call quality for encrypted voice traffic might not be economically or architecturally vi-able for all enterprises considering the move to VoIP.

Thus far, we've painted a fairly bleak picture of VoIP security. We have no easy “one size fits all” solution to the issues we've discussed in this article. Decisions to use VPNs instead of ALG-like solutions or SIP instead of H.323 must depend on the specific nature of both the current network and the VoIP network to be. The technical problems are solvable, however, and establishing a secure VoIP implementation is well worth the difficulty.

To implement VoIP securely today, start with the following general guidelines, recognizing that practical considerations might require adjusting them:

• Put voice and data on logically separate networks. You should use different subnets with separate RFC 1918 address blocks for voice and data traffic and separate DHCP servers to ease the incorporation of intrusion-detection and VoIP firewall protection.

• At the voice gateway, which interfaces with the PSTN, disallow H.323, SIP, or Media Gateway Control Protocol (MGCP) connections from the data network. As with any other critical network management component, use strong authentication and access control on the voice gateway system.

• Choose a mechanism to allow VoIP traffic through firewalls. Various protocol dependent and independent solutions exist, including ALGs for VoIP protocols and session border controllers. Stateful packet filters can track a connection's state, denying packets that aren't part of a properly originated call.

  • Use IPsec or Secure Socket Shell (SSH) for all remote management and auditing access. If practical, avoid using remote management at all and do IP PBX access from a physically secure system.
  • Use IPsec tunneling when available instead of IPsec transport because tunneling masks the source and destination IP addresses, securing communications against rudimentary traffic analysis (that is, determining who's making the calls).

If performance is a problem, use encryption at the router or other gateway to allow IPsec tunneling. Be-cause some VoIP end points aren't computationally powerful enough to perform burden at a central point ensures the encryption of all VoIP traffic emanating from the enterprise network. Newer IP phones provide AES encryption at reason-able cost.

  • Look for IP phones that can load digitally (cryptographically) signed images to guarantee the integrity of the software loaded onto the IP phone.
  • Avoid softphone systems (see the sidebar) when security or privacy is a concern. In addition to violating the separation of voice and data, PC-based VoIP applications are vulnerable to the worms and viruses that are all too common on PCs.
  • Consider methods to harden VoIP platforms based on common operating systems such as Windows or Linux. Try, for example, disabling unnecessary services or using host-based intrusion detection methods.
  • Be especially diligent about maintaining patches and current versions of VoIP software.
  • Evaluate costs for additional power backup systems that might be required to ensure continued operation during power outages.
  • Give special consideration to E-91 1 emergency services communications, because E-911 automatic location service is not always available with VoIP.

VoIP can be done securely, but the path isn't smooth. It will likely be several years before standards issues are settled and VoIP systems become mainstream. Until then, organizations must proceed cautiously and not assume that VoIP components are just more peripherals for the local network. Above all, it's important to keep in mind VoIP's unique requirements, acquiring the right hardware and software to meet the challenges of VoIP security.

Methods for VoIP end to end security

Voice over IP (VoIP) security where security design patterns may prove exceedingly useful. Internet telephony or VoIP has grown in importance and has now passed the tipping point - in 2005 U.S. companies bought more VoIP phones than ordered new POTS lines. However, with the powerful convergence of software-based VoIP to enable new functionality to store, copy, combine with other data, and distribute over the Internet also comes security problems that need to be solved in standard ways in order to ensure interoperability. This is further complicated by the fact that various vendors competing for market share currently drive VoIP security.

Given the importance of VoIP security, we are only aware of only two other efforts for VoIP security design patterns, a chapter within and an unpublished M.S. thesis supervised by Eduardo Fernandez of Florida Atlantic University.

Figure 1. VoIP Infrastructure Vulnerabilities

NIST released a report on VoIP security in January 2005 . This report elaborates on various aspects of securing VoIP and the impact of such measures on call performance. The report argues that VoIP performance and security are not seamlessly compatible; in certain areas they are orthogonal. We briefly review this report and group VoIP infrastructure threats into three categories as depicted in Figure 1:

(1) Protocol

(2) Implementation and

(3) Management

Quality of Service (QoS) Issues

A VoIP call is susceptible to latency, jitter, and packet loss. ITU-T recommendation G.114 has established 150 ms as the upper limit on one-way latency for domestic calls. If Goode's latency budget is considered, very little time (< 29 ms) is left for encryption/decryption of voice traffic. QoS-unaware network elements such as routers, firewalls, and Network Address Translators (NAT) all contribute to jitter (no uniform packet delays). Use of IPsec both contributes to jitter and reduces the effective bandwidth. VoIP is sensitive to packet loss with tolerable loss rates of 1-3%; however, forward error correction schemes can reduce loss rates.

Signaling and Media Protocol Security

SIP (Session Initiation Protocol) (RFC 3261) and H.323 are the two competing protocols for VoIP signaling. H.323 is an ITU-T umbrella of protocols that supports secure RTP (SRTP) (RFC 3711) for securing media traffic, and Multimedia Internet Keying (MIKEY) (RFC 3830) for key exchange. SIP supports TLS and S/MIME for signaling message confidentiality and SRTP for media confidentiality.

Firewalls and NATs

RTP is assigned a dynamic port number that presents a problem for firewall port management. A firewall has to be made aware of the ports on which the media will flow. Thus a stateful and application-aware firewall is necessary. However, if a client is behind a

NAT, call establishment signaling messages transmit the IP address and RTP port number that is not globally reachable. NAT traversal protocols like STUN (RFC 3489), TURN (RFC 2026), and ICE (14) are necessary to establish a globally routable address for media traffic. For protocols that send call setup messages via UDP, the intermediate signaling entity must send to the same address and port from which the request arrived.

Encryption and IPsec

IPsec is preferred for VoIP tunneling across the Internet, however, it is not without substantial overhead. When IPsec is used in tunnel mode, the VoIP payload to packet size ratio for a payload of 40 bytes and RTP/UDP headers drops to ~30%. The NIST solution to avoid queuing bottlenecks at routers due to encryption is to perform encryption/decryption solely at endpoints. SRTP and MIKEY are specified for encrypting media traffic and establishing session keys respectively.

Categorizing VoIP Threats

The threats faced by a VoIP are similar to other applications including: unwanted communication (spam), privacy violations (unlawful intercept), impersonation (masquerading), theft-of-service, and denial-of-service. Table 1 groups these threats into protocol, implementation, and management categories.





end-to-end protection as well as hop-by-

hop (Proxies might be malicious)




most VoIP devices are managed remotely

Identity Assertion

Users concerned about whether they are

talking to the real entity as opposed to a

'phished' entity

Reputation Management


Buffer Overflow, Insecure Bootstrapping.


Access Control

protection against unauthorized access to

VoIP servers and gateways

Power Failures

Table 1. Categorizing VoIP Threats

Secure VoIP call

The Secure VoIP call pattern hides the meaning of messages by performing encryption of calls in a VoIP environment.


Two or more subscribers are participating in a voice call over a VoIP channel. In public IP networks such as the Internet, it is easy to capture the packets meant for another user.


When making or receiving a call, the transported voice packets between the VoIP network nodes are exposed to interception. How to prevent attackers from listening to a voice call conversation when voice packets are intercepted on public IP networks?

The solution will be affected by the following forces:

  • Packets sent in a public network are easy to intercept and read or change. We need a way to hide their contents.
  • The protection method must be transparent to the users and easy to apply.
  • The protection method should not significantly affect the quality of the call.


To achieve confidentiality we use encryption and decryption of VoIP calls.


In cases where performance is an important issue, symmetric algorithms are preferred. Such algorithms require the same cryptographic key (a shared secret key) on both sides of the channel.

If the IPSec standard is used, it is necessary for participants in a call (i.e. Caller and Callee) to agree previously on a data encryption algorithm (e.g. DES, 3DES, AES) and on a shared secret key. The Internet Key Exchange (IKE) protocol is used for setting up the IPSEC connections between terminal devices. The caller encrypts the voice call with the secret key and sends it to the remote user. The callee decrypts the voice call and recovers the original voice packets.

Additionally, the Secure Real Time Protocol (SRTP) can be used for encrypting media traffic and the Multimedia Internet KEYing (MIKEY) for exchanging keying materials in VoIP.

If public key cryptography is used, the callee must obtain the caller's public key before establishing a connection. The caller encrypts the voice call with the callee's public key and sends it to her. The callee decrypts the voice call and recovers the original voice packets.

The class diagram of Figure 4 shows a Secure-channel communication in VoIP (adapted from the Cryptographic Metapattern in).This model uses the Strategy pattern to indicate choice of encryption algorithems. Both the Caller and Callee roles use the same set of algorithms although they are shown only in the caller side.


The advantages of this pattern include:

  • Symmetric encryption approaches provide good confidentiality.
  • Encryption is performed transparently to the user's activities.
  • The need to provide separate VLANs for VoIP security could possibly be removed.
  • It may no longer be necessary to use IPSec tunneling that was previously required in the MAN/WAN.

Figure 4 Class Diagram for a VoIP Secure Channel

Possible disadvantages include:

  • The quality of the call can be affected if encryption is not performed very carefully [Wal05].
  • It is hard to scale because of the need for shared keys.