Impact of Packet Impairments on VoIP Quality


Voice over Internet Protocol abbreviated as VoIP is a collection of communication technologies used to send traditional voice signals over IP infrastructures like the internet. It allows users to make phone calls using any IP based unit; this could be computer, mobile phone devices, or specially developed phones for sending voice communication over IP networks. IP Telephony refers to suites of telephony services available over an IP infrastructure; such services include phone calls, voicemails, internet, fax, email. The VoIP technology converts traditional voice analogue signals to digital format which can then be transmitted over IP networks.

VoIP technology has collection of its benefits and challenges as well. One of the benefits being reduction in TCO (total cost of ownership) and also in cost of migration especially in mobile users scenario or platform.

This project aims to analyse how voice quality is affected by such network impairment characteristics like delay, jitter, packet corruption, packet duplication, and packet loss. This project will also attempt to construct a matrix of best voice quality score versus possible occurrences of these packet impairments.

Measures needs to be developed and carried out to help condition, manage or avoid congestion of VoIP traffic either through prioritization, or low latency queuing, policing or shaping, header or full packet compression, and this project aims to help in understanding what is required to design such measures.


Voice over Internet Protocol (VoIP) is a transmission telephony technology being used in the commercial arena of modern times as an alternative replacement to the traditional packet switch telephony technology. It works by sending traditional voice analogue signals over IP platforms such as the internet or a local area network. Before now, voice communication was mostly transmitted using analogue infrastructures to provision services such as Plain Old Telephone Service (POTS), Custom Local Area Signalling Service (CLASS), Advanced Intelligent Networking Service (AINS), and the Public Switched Telephone Network Service (PSTN).

1.1 Research Motivation and Objectives

Introduction to voice communication is incomplete without mentioning bits of its history. It all started in the 19th Century, when Alexandar Graham Bell invented telephone on 10th March, 1876. He was working with Thomas Watson at the time, and the very first voice transmission was “Watson, come here; I want you”. His version of Telephony System was presented to the whole world at the Centennial Exposition which was an annual event organised in Philadelphia, Pennsylvania at the time. In 1877, all these led to the setup of what was known as Bell Telephone Company. In the early 20th Century, many inventors studied and researched into the possibility of transmitting voice communications over wireless medium through the means of amplitude modulation. It happened, and the world saw rapid development in radio broadcasting and the society changed forever.

The history of VoIP in itself dates back to late 20th Century when Vocaltec a small company marketed internet phone software devised to run on personal computers. Named Internet Phone, the software uses the H.323 protocol suite as opposed to SIP which is quite common in modern times. The company's product was like the Skype of the late 20th Century, but they lack one critical feature which is broadband capability, because the internet phone software uses traditional internet modems which affects the quality of voice signal reception as observed or perceived by the listener at the other end of the conversation, thereby resulting in impaired substandard and poor voice quality with respect to a normal PSTN phone call. But, it was a major breakthrough in the telephony world, being able to send voice communications over the internet.

We know that VoIP has brilliant features; there is so much flexibility that comes with transmitting digital data over the internet. What Voice over IP achieves is converting analogue voice signals to digital format; this means the service provider can then explore the availability of the digital signals. For examples, they can forward incoming calls to voicemail if needed; also they can forward voicemails to you as email attachments so you can listen to it from any capable device. There is also the availability of conferencing where three or more people can all communicate using one voice communication call. Voice over IP also provides extra services such as caller ID; call blocking and call waiting mostly for free, which wouldn't have been free with traditional phone line connections.

What about the challenges of Voice over Internet Protocol (VoIP) service? The main problem is that VoIP has is competition for bandwidth from other mission critical application services on various IP based links. Putting it to perspective, there are several network issues or impairments that causes poor VoIP call quality. Brilliant and excellent applications like Vonage, Skype, VoIPBuster still relies heavily on the network link state between the caller party and the called party on both ends of the communication link. Network issues such as delay, jitter, buffer overrun, packet duplication errors, dropped packets etc. do affect the quality of voice communication. This project aims to analyse the impact of these impairments on VoIP quality. Using state of the art commercial technology with a combination of open source software and MATLAB, various graphs will be plotted and a matrix will be developed which will show clearly how these impairments affect VoIP quality.

1.2 Contribution

The contributions of the work reported in this project include:

* A simulated test bed of packet impairments such as corruption, duplication, reordering as a result of loss, jitter, end-to-end delay.

* An analytical understanding of how these packet impairments affect Voice quality within the network infrastructure.

* A documented possible precautions and solutions to provide quality of service for voice communication during time of network congestion or delay.

* A QoE matrix that includes information such as queuing techniques, MOS scores, packet impairment variations like delay, jitter and packet loss.

1.3 Outline and Organisation

This project is organised as follows:

Chapter 2 gives background information about the technologies behind Voice over Internet Protocol (VoIP) with spotlight on the H.323 protocol suite used in this Msc project's test bed design and experimentation.

Chapter 3 provides insight to the metrics and technique used in measuring call quality within packet-based systems.

Chapter 4 presents the design prototype for this Msc Project

Chapter 5 shows the shortcomings of the test bed design and then evaluates the results of this project research effort.

Chapter 6 concludes this research work with its contributions and then suggestions for future development and improvement of similar research efforts.

Appendix A shows how objective Voice quality monitoring statistics was obtained from Cisco 7960 IP Phones.

Appendix B includes description of steps carried out to complete the task of setting up the Cisco Unified Communications Manager 7.0 services used for this project.

2.1 VoIP History

It is not a surprise that recent evolution of telephone technologies transcends the technical world of voice over internet protocol. Over the years, telephone systems have proved to be an important aspect of human communication. Voice over internet protocol relies strongly on the existence of the telephone technology and also the internet technology. The benefits we derive from telephone today can be attributed to the hardwork done by Alexander Gram Bell and Elisha Gray in 1870s. Up till today, Alexander Gram Bell is known as the father and legal inventor of telephone technology. People who had lines had to get them installed in pairs in other to connect with one another. Then someone invented a switch circuitry that could connect a line to 100s of lines at the dial of a button at a repeated number of times. This was known as the Strowger Switch which was named after its inventor Almon B. Strowger in 1889. This switch worked using mechanical devices such as relays and sliders to aim in making the call on the line and was used for over 100 years. Moreover, this system was replaced in 1896, but the major overhaul of the technology took place when the transistor was invented. Several people attempted to move the technology forward, but it was not until Claude Shannon released his mathematical theory of communication that promoted the ideology of using binary codes with the aid of transistors to pass communication signals from one end to another. His technical paper outlined the foundation of digital communications history, from mobile devices to the internet platform.

The telephony world experienced intense work of research over most parts of 20th century, but not until sometime in the 21st century that internet protocol telephony came into being. As a service based on voice over internet communication protocol, internet protocol telephony has proved to be gaining grounds against old fashion legacy public switched telephone network infrastructure. The history of VoIP can be traced back to 1974 when the institute of electrical and electronic engineers in the United States published a paper titled “A Protocol for Packet Network Interconnection.” [1]

This paper talked about a need and proposal for a communication protocol which can aid in transmitting packet over network infrastructures, and then came the birth of internet protocol as specified in RFC 791. In early 1980s, the internet was already proving popular and a company called VocalTec released the first commercial available internet phone software [2].

But there was a need for standardisation on how voice signals and packets should be transmitted over the already established internet protocol. So in 1996, the Telecommunication Standardization Sector (ITU-T) which coordinates standards for telecommunications on behalf of International Telecommunication Union (ITU) began the development of a standard which we commonly know today as H.323 standard [3]. In 1997, a communications and information services company began development of its first softswitch [4].

In 1999, RFC 2543 was released which defined a signalling protocol not as “heavy” as the H.323, which can be widely used to control voice and video communication sessions over internet protocol. The main feature of this newer standard is interoperatability between different vendor equipments. Also in the same year, Mark Specer of Digium developed the first open source private branch exchange software popularly known as Asterisk [5].

Telephone communication has advanced over the last 10 years. This advancement spanned through both analogue and digital technologies and ways of transmitting voice, and this has been well ascertained in various countries round the globe. People are also seriously considering cheaper alternatives to traditional ways of communicating, most especially we find more and more people using the internet and the interest behind social networking sites such as Facebook and MySpace have skyrocketed. Voice communication has also experience a paradigm shift in that, people are beginning to explore cheaper means of talking. By now, we know voice transmission over internet is much cheaper than traditional telephone networks [6].

Also standards such as SIP have opened up a wide opportunity bases for telephone service providers and users alike. We now have telephone features and services which were almost impracticable to implement on the traditional public switched telephone network infrastructures. However, just like any other thing or technologies invented of men, there are good sides and also shortcomings to this technology. One of which is in its limited emergency services accessibility, for example, emergency calls like 911 has to be rerouted through a local VoIP gateway through a public switch network infrastructure. Extra design methodologies has to be not only adopted but also implemented to cater for emergency calls across different sites in any voice over internet protocol or IP telephony infrastructure.

2.2 VoIP Technology
2.2.1 Overview

In other for voice transmission over internet protocol to happen, firstly, call control signalling has to take place. This is accomplished by using a protocol which initiates the setup and make sure enough resources exists before placing the actual voice call. After this initiation, the users of the call can now communicate voice. 1 below shows the process that voice from a caller goes through to reach the called party. The voice goes through process of sampling, quantization, encoding and then optionally through compression before reaching the called party through a communication channel which can either be wired connection or wireless network. By the time the voice gets to the called party, a reverse process occurs where the voice goes through decompression, decoding, de-quantization and then de-sampling, which then enables the called party to actually here the voice of the caller. All these processes have to be seamless in other to make a sensible communication between both ends.

In practical terms, this basic communication process can be implemented using transmission channel which could consist of local area networks and wide area networks. It could consist of wireless networks and wired links, it could even consist of the internet where the caller and the called party are across other sides of the internet through a gateway router. The gateway router is a vital element of VoIP communication. It provides signalling interfaces like the foreign exchange office interface (FXO) and the foreign exchange station interface (FXS) in popular vendors' gateway routers; it also provides a point of entry into the public switched telephone network (PSTN). The FXO interface connects to the central office infrastructure and receives battery power to receive ring signals, meanwhile the FXS interface connects to end station, usually analogue phones, and this interface produces battery power and generates ring signals. The gateway router is in itself an IP device that operates on the Network layer of the OSI model, so in essence it interfaces between the IP network and the PSTN infrastructure. It could also interface to digital circuits like E1, T1, ISDN etc. In general, the gateway handles signal conversion between two interfaces such as PSTN and VoIP traffic.

Transmission Channel could be either wired, wireless or satellite communication links. Although wired communication links are much common just like combination of LANs and WANs as shown in 2, so also is Wireless communication channel as shown in 3, and there is also a possible implementation of this using a combination of wireless and satellite media for the transmission channel link as shown in 4. This is mostly common in adhoc setups, for example, charity agencies on rescue missions to areas affected by natural disasters need a means of communications, especially if wired connections are hard to come by.

VoIP infrastructure that can only transmit calls to devices on IP platform is being under-utilised. Users on the VoIP/IPT platforms still needs to communicate with users on the PSTN, and to accomplish this, the VoIP network has to be connected to the PSTN through what is mainly known as a voice gateway, which provides physical interface and logical conversion between two or more different infrastructure technologies. For example, if a VoIP user in the UK wants to connect to PSTN user in Nigeria, because of the sporadic nature of internet access in Nigeria, VoIP digital signals must be converted to analogue signals and thus connection established over the internet to be terminated at the PSTN in Nigeria. 5 shows how this interconnection looks like.

2.2.2 Protocol

VoIP is a technology largely built around transmission control protocol (TCP) and internet protocol (IP). Having a closer look on the open systems interconnection (OSI) model layer, VoIP protocol suites can be mapped. As shown from 6 below, encapsulation process in VoIP communication is closely related to most TCP/IP based systems, sharing common physical and data link layers, and for the network layer, Internet Protocol (IP) is used instead of IPX. Also, UDP is used in the transport layer because of the fact that it is quicker and faster than the TCP.

TCP introduces delay in communication because of its connection oriented nature where each ends of the communication channel has to confirm reception of packets by sending back acknowledgements, and also retransmit loss packets. Voice applications rely on seamless transfer of packets, hence why UDP is adopted. User Datagram Protocol is a connectionless oriented protocol; it is much quicker and faster than TCP. There is no requirement to setup pre-communication channels to negotiate link availability, neither is there any need for loss packet retransmission. However, UDP transmission can suffer from packet impairments on the network, hence the need to find out how VoIP/UDP packets can be impaired with such network inconsistencies like congestion in the network which can cause delay, jitter, packet loss etc. RTP

Real Time Transport Protocol was defined in RFC 3550 documentation titled “RTP: A Transport Protocol for Real-Time Applications” [7] in the year 1996, and this document was later outdated by the RFC 3550 published in the year 2003. The RFC 3550 document stated that RTP delivers end-to-end transmission services for data with real-time features, such as interactive audio and video. It also stated that “Those services include payload type identification, sequence numbering, timestamping and delivery monitoring”. Developed by AVT working group (audio video transport) of the internet engineering task force (IETF), RTP is used mostly in applications that involve some form of media streaming, for example video conferencing, IPTelephony applications etc.

RTP uses User Datagram Protocol at the transport layer of the OSI network model to carry media streams which are mostly signalled by H323, SIP, MGCP, SCCP etc. So RTP is an integral aspect of the voice over IP (VoIP) world. RTP Header is defined by RFC 3550 to have a minimum of 12 bytes, because optional headers could be present. The RFC document stated that every RTP packet contains the 12 octets, but that a mixer can insert a list of CSRC identifiers. The field in the headers have the following:

Ver.: This field describes the version of the RTP used. RFC 3550 defined version 2. Value could also be 1 or 0, ver. 1 as used by the first draft of RTP and 0 as used by the legacy “vat” audio tool protocol.

P (Padding): This is a 1 bit field used to show whether the RTP packet has extra padding the back of it. Padding is necessary for things like encryption to fill up empty block spaces at the end of packets.

X (Extension): This is a 1 bit field which shows the existence of extension header between standard header and the payload.

CC (CSRC Count): This is a 4 bits field that shows the amount of CSRC identifiers present after the fixed header.

M (Marker): This is a 1 bit field used to show the relevance that RTP payload has to some applications and it works at application level.

PT (Payload Type): A 7 bits field used to describe the format of the RTP packet payload and shows how applications can interpret it.

Sequence Number: This is a 16 bits field incremented by one for each RTP packet sent and used by receiver to determine the amount of packet sent and also detect packet loss. This is one of the most important fields of the RTP packet header.

Timestamp: As one of the largest and the most important fields in RTP header, the timestamp field is used to enable receiver to replay samples at regular intervals.

SSRC: This is also a 32 bits field known as the Synchronisation Source Identifier which is used to identify uniquely the sources of RTP streamed packets.

CSRC: This is a 32 bits field known as Contributing Source IDs which identifies various sources which contributes to the media streamed from different or multiple sources.

2.2.3 Signalling

Just like the traditional voice signalling protocols, VoIP Signalling protocols are also used to setup calls, maintain calls and also teardown calls. There are handful of protocol varieties in use both commercially and in private networks today, some are standard based, while others are proprietary each having its mods and cons. When a caller attempts to initiate a call, the signalling protocol comes into play and tries to negotiate some pre-conversation parameters with the called party. It checks for availability by ensuring layer-3 reachability to called party, coding compatibility and some signalling protocols checks for bandwidth availability between the caller and the called party. If all these go ok, then users can transmit voice conversations in form of RTP stream data, while the call is going on, the signalling protocol helps to monitor call quality and also comes into play when the call is being terminated by either the called or caller party.

There are three signalling protocols mostly used in the industry today, Media Gateway Control Protocol (MGCP) which is a proprietary signalling protocol developed by Cisco Systems, Session Initiation Protocol which is a standard developed by the Internet Engineering Task Force (IETF) and the H323 and ITU-T standard protocol developed by the International Telecommunication Union group. MGCP

Media Gateway Control Protocol (MGCP) is a client-server protocol for handling call signalling. It works in a centralised system which allows call agents to centrally manage voice gateways and trunks within the VoIP infrastructure. MGCP v1.0 was described in RFC 3435 while its architecture was described in RFC 2805 [8] [35]. IETF defined call agents as media gateway controllers, the voice gateways and end devices rely on the centralised call agent system for processing of voice communications.

MGCP Call Agent

The call agent also known as media gateway controller (MGC) as specified in RFC 3435 which postdates RFC 2705, this RFC mentioned that MGCs uses application programming interfaces (APIs) and a simple readable text based slave master system to control the media gateways (MG). The non logical parts that make up the MGCP system are called components; the logical parts are referred to as concepts. MGCP Components include Endpoints, Gateways and Call Agents, while MGCP Concepts includes Call, Events and Signals.

The Call Agent is a component which process and controls call intelligence, and it does this by managing the operations of gateways. An example of a call agent (also known as media gateway controllers) is Cisco Unified Communications Manager. Media gateways delivers conversion service from signals received over IP platform to other infrastructures like the Public Switched Telephone Networks (PSTN). This could be a Network layer device but must have analogue or digital voice ports to interface with the PSTN. Endpoints are essentially sources or destinations for voice or data packets. These could either be physical or non physical. Examples are FXO and FXS ports, trunk interfaces to PSTN, POTS connections to Key Systems, Telephones or PBXs.

The Connection or Call is established between endpoints to transmit data between them. This connection could either be multipoint in cases of audio/video conferencing or point to point connection like a direct telephone conversation. Events are used to watch out for change in the signal levels of endpoints when a change occurs such as placing a phone on or off its hook. MGCP has series of commands and responses that are used to pass events from one endpoint to another. These are shown in 9 and described below:

Endpoint Configuration (EPCF) - the Call Agent sends command to the gateway to find out the coding requirements of voice calls to other endpoints.

Create Connection (CRCX) - This command is used to setup connections between the call agent and the media gateway within communication path.

Modify Connection (MDCX) - This command is used to tell the gateway to modify session connection parameters.

Delete Connection (DLCX) - In absence of required communication resources, the call agent or the gateway might initiate this to the other communication party.

Notification Request (RQNT) - This is used to tell the gateway to watch out for an endpoint event and also specify what step to take in such a situation.

Notify (NTFY) - The gateway makes use of this to inform the call agent that an event has occurred on an endpoint.

Audit Endpoint (AUEP) - Used to obtain the status of an endpoint like bearer information and capabilities.

Audit Connection (AUCX) - This is used to get information on connection status.

Restart In Progress (RSIP) - This is used to notify the call agent that the end devices are longer reachable.

It is worth noting that MGCP uses UDP 2427 to send the command messages but uses TCP 2428 to exchange keepalives between call agent and media gateways. SIP

Session Initiation Protocol was developed in 1996 under the network working group of the Internet Engineering Tax Force (IETF). This is a text based signalling protocol similar to the HTTP structure and format, specified and defined within RFC 3261, SIP problems are much easier to troubleshoot than most other signalling protocols like H323. SIP also has the ability to conceptualise the location of users, their availability to communicate. It makes use of TCP/UDP 5060 and also TLS over TCP 5061.

SIP protocol makes use of a message structure similar to the Hyper Text Transfer Protocol (HTTP), and transfers these messages in a peer-to-peer relational model. These peer-to-peers are known as user agents (UA). There could be a user agent clients (UAC) which kick starts the connection through transmitting an INVITE messages, whereas the user agent servers (UAS) which returns replies to the INVITE messages. Also in SIP, there are four main types of servers: Proxy Server which does the forwarding for a UAC, Registrar server which registers location of clients, then Redirect servers which informs the user agent of the next server to contact for address lookup and finally the Location server who delivers address resolution for SIP redirect and proxy servers. SIP messages could be Request message which are sent from clients to servers, then the other type of message is the Response message which are sent from servers to clients. The Request messages include such kinds of messages as INVITE and BYE. The INVITE message sends requests for joining a call session while the BYE message is used to leave a call session. The Response messages make use of status messages very much similar to HTTP like the Error 404 or Error 504 message. Table 1 below shows different kinds of Requests and Responses found within SIP call setup.

Table 1. SIP Request and Response table

SIP Request


SIP Responses



Invite user to a call


Request being processed


Acknowledges message delivery and exchanges


Successful action


Get UA capabilities




Used to terminate communication session


Client Error usually due to bad syntax or server can't deliver response to request


Used to terminate pending request


Server Error


UA address locator


Global Failures due to fact that request can't be acknowledged by any server.


Used to communicate during call session

Notice the Proxy server immediately sends back a TRYING message, and then passes on the initial INVITE request to the called UA. The called UA then returns a TRYING and RINGING and OK signal which the Proxy server passes accordingly to the caller - UA1. UA1 then senses the called UA2 phone ringing and then passes on ACK to acknowledge the receipt of ringing and ok signal. Both the caller and called party at this stage then exchanges RTP audio and RTCP informational statistics. After the conversation, the called party sends a BYE request directly to its caller who then acknowledges with OK message. Notice that one the location of the called party is confirmed through the proxy server, the ACK, RTP/RTCP, BYE and OK messages are transferred directly between the two communicating UAs. H.323

H.323 is widely implemented in today's voice and video conferencing devices; this is because it has been in existence for a while now in comparison to some of the newer protocols like SIP. H.323 is actually a suite of protocols, not in itself standalone but an umbrella and collection of different protocols which provides signalling for voice and video communication. Developed by ITU, H.323 suite of protocols is intended to deliver inter-vendor connectivity and some form of multimedia transmission across diverse networks.

The two major sub protocols under the H.323 umbrella include H.225 and H245 protocols. The H.225 protocol handles call setup, registration, admission and other status functions (RAS functions). While the H.245 handles control of call, which encompasses check the capabilities of H.323 components in communication. In addition to these protocols, H.323 also defines endpoints popularly known as physical components. This includes terminals, gateways, gatekeepers and multipoint control units.

Terminals are user's end devices like IP Softphone on a PC, IP phone or a media conferencing system. The interface with the users, and must support G.711 codec and are the most elementary units of the physical components. The Gateways translate one audio signalling format to another format. Audio coming from traditional public switched telephone networks could not possibly have been able to be heard on IP networks without a gateway to interface between the 2 different kinds of networks. This gateway could either be in form of IP-to-IP or IP-to-NonIP. 11 below shows the position a gateway adopts in a H.323 setup.

Another physical component of H.323 is the Gatekeeper. I like to refer this as “the lord of the WAN”; the Gatekeeper manages collections or clusters of gateways. Also, the Gatekeeper delivers admission control and address translation service and sometimes billing services as well. As optional component, the Gateway can proof valuable in situations where there exist large VoIP zones of gateways. In large networks, there could be issues with bandwidth utilisation or oversubscription when so many end devices are trying you use limited bandwidth resources. The Gateway will and can help to determine whether to allow a call to go through or not depending on the available network resource.

Multipoint Control Units (MCUs) are also components of H.323 which helps to process and make conferencing calls. They accept multiple audio/video feeds from many users and multiplex it into a single stream which is then sent to all members of the multimedia conferencing.

This section will be incomplete if signalling and control mechanism of H.323 is not highlighted. Signalling provides that pre-connection mechanism which helps to determine adequate resources and logistics required to setup maintain and disconnect (teardown) a voice or video session. Being a suite of protocols, H.323 employs the help of protocols as shown in 12 to deliver these operations.

H.323 header

Concentrating on the signalling element of the protocol stack, we can see the three main control protocols in use: H.225.0 Registration Admission and Status (RAS) protocol, H.225.0 Call Signalling protocol and the H.245 Call control protocol.

As a member of the H.323 protocol suite, H.225.0 RAS performs registration, admission, bandwidth management, routing of calls and also dial plans. This aids communication between the gatekeeper and the H.323 end devices using unreliable user datagram protocol port 1719 for messages and 1718 for gatekeeper discovery. The H.225.0 Call control protocol helps to make connections between H.323 end devices or entities such as gateways or IP Phones. This could either be in form of direct endpoint or gatekeeper router signalling and uses reliable transmission control protocol port 1720 [12]. Direct endpoint signalling occurs when H.323 end devices sends admission request to gatekeepers for registration, admission and status checks, the gatekeeper then approves or reject such request. If approved the call initiating device will then attempt to make direct connection to the end H.323 device. While in the gatekeeper router signalling, all forms of signalling communication goes through the gatekeeper in charge of the zone. The H.245 call control protocol provides capabilities exchange, determines who the master of the communication channel will be and who the slave will be, conferencing and flow control.

Voice sequence in H.323 is particularly interesting, especially in a standard topology often deployed in the commercial world. For the purpose of this project work, I will describe the a call connection and setup with two gatekeepers and one gateway connecting to each one as shown in the 13 below. From the the following is a sequence of events that occur according to the numbers on the communication lines:

1. Registration Request (RRQ) - GatewayA sends a registration request to the GatekeeperA registering its 32 bits logical IP Address and respective E.164 numbers that exist on it.

2. Registration Confirm (RCF) - The GatekeeperA device replies (1) with a confirmation message. At this stage the Gatekeeper could also choose to reject the request if the Gateway fails some level of authentication predetermined by administrators.

3. Admission Request (ARQ) - The GatewayA notifies the Gatekeeper that it wants to initiate a call on behalf of the IP Phone A which sits on it.

4. Admission Confirm (ACF) - The Gatekeeper confirms the request for call admission after checking for necessary bandwidth availability etc.

5. Admission Request (ARQ) - The GatewayA requests for destination of the called number from the Gatekeeper, and then tries to locate the IP address to direct the request to.

6. Request (LRQ) - Before admitting or approving the request above (5), the GatekeeperA sends a Location Request message to the GatekeeperB (who is in charge of the zone which houses GatewayB).

7. Confirm (LCF) - The GatekeeperB router confirms the location of the called number to the GatekeeperA through a Location confirmation message.

8. Admission Confirm (ACF) - The GatekeeperA then confirms the admission of this call and notifies GatewayA where to place the H.225 to (destination H.225 gateway device)

9. H.225 setup - GatewayA then place the call to Phone B's extension number 200 directly through the GatewayB router. This message uses TCP 1720.

10. Admission Request (ARQ) - GatewayB on receipt of the H.225 call setup message sends an admission request to its own Gatekeeper - GatekeeperB. This is to make sure there is available resource for the call to be admitted.

11. Admission Confirm (ACF) - The GatekeeperB confirms the admission request and tells the GatewayB to accept the call. This could be a reject message if there are no enough resources to accept the call.

12. Response to call Setup - The GatewayB then responds to GatewayA's setup request.

13. H.245 exchange - The GatewayB then attempts to negotiate feature capabilities and all things being good opens up a communication channel with GatewayA.

14. RTP - At this stage, multimedia traffic is then communicated between the end devices i.e. IP Phones, now the end users can hear each other's voices or see each other in case of video conferencing.

15. RTCP - This is also setup between end devices and keeps track of call statistics and quality information of the communication between the end devices.

16. DisengageRequest (DRQ) - After call completion, both Gateways in participation sends a request to their respective Gatekeepers to disconnect or teardown the communication session following call termination by either of the end devices {IP Phones} .

17. DisengageConfirm (DCF) - Both Gatekeepers approves or confirms this disconnection or disengagement of communication session.

H.323 operation

2.2.4 Digitising Voice Signal

Human analogue voice transmission over digital infrastructures need some form of analogue to digital conversion. The analogue signals has to be sampled, quantised, encoded and then compressed. The compression aspect is optional in most setups but absolutely necessary on low bandwidth links.

The Sampling process involves taking samples analogue waveforms using devices such as microphone or telephone's mouthpiece for digitisation. In 1933, Harry Nyquist in his theorem says that you need to sample at a rate that is at least twice as high as the highest frequency being sampled [9]. In the voice world, human voice highest frequency is 4000Hz, which means according to Nyquist theorem, a total of two times 4000Hz samples is required, that is 8000 samples every second.

The next step is quantisation, which involves assigning a number from a finite set to each of the amplitude samples of the waveform. After quantisation, these digitised waveforms are then encoded using what is known as a codec. Examples include Pulse Code Modulation (PCM), Adaptive Differentiate Pulse Code Modulation (ADPCM), Conjugate Structure Algebraic Code Excited Linear Predication (CS-ACELP) and Low Delay Conjugate Excited Linear Predication (LDCELP).

An example of the PCM is the popular G.711 codec which encodes without compression. However, it is one of the most popularly used codec on local area networks because of abundance of bandwidth resource on LAN. ADPCM modulation technique uses a delta which is the difference between a sample and the previous sample, so that it wouldn't have to encode entire samples each time. An example is the G.726 codec. The CS-ACELP is more intelligent; it actually learns speech patterns and develops a codebook which it uses to predict the next possible speech pattern in sequence. It encodes the codebook location of the predicted sample rather than encoding the whole samples each time. An example of this is the popular G.729 codec which is mostly used in the commercial environment to encode speech patterns over wide area networks (WAN). Lastly there is LDCELP which is closely related to CS-ACELP, but uses a smaller codebook and requires more bandwidth while using smaller codebook. An example is the G.728 codec.

Table 2. Codec Comparison (after [13])


Voice Block Size (bytes)

Compression Ratio

Bit Rate (kbps)

G.711 PCM




G.723.1 MP-MLQ




G.723.1 MP-ACELP




G.726 AD-PCM











8.0 SCCP

SCCP is a Cisco proprietary voice call control endpoint termination protocol, it was originally developed by Selsius Systems, Inc. but has since been inherited and owned by Cisco Systems, Inc. SCCP makes use of TCP port 2000 to communicate between voice endpoints and call manager applications. While the H.323 recommendation is complex and expensive for a testbed setup, SCCP has proved to be a simple and relatively easy protocol of choice for a testbed setup. For the purpose of this project, SCCP will be used to provide signalling between Cisco IP Phones and a Cisco Unified Communications Manager Server 7.0. SCCP consumes less resources and processor power compared to H.323 and SIP. Cisco IP phones run SCCP's skinny client and communicates with the unified communications manager using Transmission Control Protocol. Once the call agent establishes who the IP Phone is, it then use User Datagram Protocol for sending audio streams to the IP Phone.

SCCP has a total of 14 call states as specified by Cisco Systems. These define the communication exchange between SCCP endpoints, such as Cisco IP phones and Cisco CallManager now popularly known as Cisco Unified Communications Manager. The following are the states as defined by Cisco:

1. Off Hook

2. On Hook

3. Ring Out

4. Ring In

5. Connected

6. Busy

7. Line In Use

8. Hold

9. Call Waiting

10. Call Transfer

11. Call Park

12. Call Proceed

13. In Use Remotely

14. Invalid Number

Below (from [14]) is a Cisco CallManager trace that shows communication to an SCCP endpoint:

03/01/2006 16:43:19.808 CCM|StationD:

(0000044) CallState callState=2 lineInstance=1



The description of the sequence of steps as given by Ramesh Kaza and Salman Asadullah in their 2005 published Cisco Press Book is shown in Table 3 below:

Table 3. IP Phone to IP Phone Communication SCCP sequence of steps (after [15])


Phone 1 goes off-hook. This triggers an event to the CallManager, which then instructs the phone to play a dial tone. The user then dials the extension 1002 of Phone 2. As soon as the user dials the first digit, CallManager instructs the phone to stop playing the dial tone. As the user dials the digits, the SCCP messages carry this information from the phone to the CallManager.


After the user completes dialling the destination number, CallManager looks in its database to see if it can find the dialled destination number. This process is calleddigit analysisand takes place within the CallManager. The digit analysis process is similar to the functionality of a router wherein the router looks into its routing table to see if it has a valid route to forward the received packet to the destination. If CallManager cannot find the dialled number in its database or does not have the information to route the call to the dialled number, it generates a reorder tone to the calling party.


After CallManager finds the valid number/destination (phone 2 in this case), it sends the call setup information to Phone 2.


CallManager instructs phone 2 to ring and, at the same time, generates the ring-back or alerting tone to the calling party Phone 1.


As soon as Phone 2 answers by going off-hook, CallManager sends to each phone a request for the IP address and the UDP port that it is listening to. This information is required to establish the media session between phones. In this step, CallManager also checks for the media capabilities of the phones, such as the codecs supported on each phone, and invokes the transcoder media resource device if both phones talk different codecs. If CallManager fails to invoke the transcoder device because no such device is cond, the users might experience one-way audio.


IP Phones respond with IP address and UDP port information to the CallManager. CallManager communicates to each phone about the other phone's IP address and UDP port number. After both phones receive the other phone's IP address and UDP port, they start the media exchange directly between them.


After the call is terminated by either phone, CallManager instructs the phone to tear down the RTP channel and updates the call state of the phones and the date and time on the IP phones.


VoIP has its fair share of technological challenges largely down to the fact that it is relatively new compared with its public switch telephone network counterpart. This chapter highlights some impairment that affects VoIP traffic, also discusses how to assess voice quality in IP networks and performance testing measures.

3.1 VoIP Quality Metrics

With traditional public switched networks, users get near perfect voice qualify. However VoIP technology suffers from the hands of inherited network impairments which are synonymous with internet protocol platforms. Because VoIP relies so much on IP just as other traffic on the network, without adequate measures, voice packets could get staffed of required bandwidth during congestion period. This section identifies VoIP quality on the network as a function of such network impairments as delay, echo and clarity as identified by the internet engineering consortium [16]. 16 below shows quality of voice as a function of delay, echo and clarity. Under normal conditions, i.e. with no Echo and highest point of clarity, speech quality lies on the graph's origin. But as echo, delay and other network impairment elements get introduced; the speech quality object draws away from the centre of the graph.

3.1.1 Delay

Delay is defined as the time taken for a data, voice or video signal to travel from one endpoint on the network to another endpoint. This is measured in milliseconds, and there are various forms of delay. Processing delay which is the time it takes to process the header of a packet. There is also Queuing delay which is the time it takes a packet to enter and exit switching or routing queues of network devices. Transmission delay is another kind of delay which is the time it takes for data bits to be placed on actual physical links by network devices. Finally, the Propagation delay which is the time it takes for a signal to travel from its source to its intended final destination. The International Telecommunication Union (ITU) stated in the G.114 recommendation [17] which deals with acceptable delays for voice applications, that up to 150ms of packetized delay is acceptable and over 400ms is unacceptable and that engineers should be aware of quality impairments factors that occurs for delays which lies between the range.

There are two main sources of delay, just as identified by Cisco documentation on Understanding Delay in Packet Voice Networks from [18] as fixed and variable sources. Fixed sources, according to Cisco Systems, are delays that add directly to the total delay on the path between the source of transmission and the destination at which the receiver lies. Fixed sources include Coder delay, Packetisation Delay, Serialisation Delay and Switch Delay. Meanwhile variable delays come from queuing process that occurs mostly at egress link's buffers on serial ports which connects to the wide area network. Output Queuing Delay is a good example of variable delay because it takes dynamic traffic various units of time to get processed. These variable delays are called jitters and are controlled using buffers which supports de-jitter at receiving en of the communication. 18 below shows a voice packet flow through a sample network and various kinds of delays such pack goes through from source to destination.

3.1.2 Echo

Echo is a phenomenon which occurs in VoIP systems when a listener's voice or sound is reflected back to them almost immediately after such sound is transmitted. One of the main courses of echo is poor design of telephony system; an example is when voice signal is feed from the mouthpiece of the caller straight back to the caller's earpiece. But mainly, Echo is caused by poor hybrid component that converts 2 wire circuits coming in from customer's premises equipment to 4-wire circuitry that goes into service provider's switch. Freeman wrote in his book and mentioned that traffic on the receiving side of the converter sometimes leaks into the sending path at the hybrid junction [19]. The problem of echo has long been solved by better designed systems, and also through line echo cancellers which helps to remove totally or sometimes reduce echo levels caused by the poorly designed hybrid junctions/converters. Echo cancellers monitors voice signals and automatically predict echo levels, once this is done; they then invert the effect of the echo signals by exclusively negating it and then combining it with regular voice patterns. 19 shows a graph of the acceptable echo levels distinctiveness using Talker Echo Loudness Rating (TELR) and according to one way propagation delay.

3.1.3 Clarity

Amongst techniques used in measuring quality of voice, Clarity has proved to be the most expansive and subjective of them all [21]. Clarity definition abounds as there are so many definitions out there, but the Internet Engineering Consortium (IEC) describes it in the context of voice quality analysis. IEC defines Clarity as the perceptual fidelity, the clearness, and the non-distorted nature of a particular voice signal [16]. Clarity should not be mistaken for understandability, as comprehension of spoken words has nothing to do with distorted conversation.

One possible source of impairment to voice clarity is noise, unwanted signals which hails from bad analogue circuitry, bit error processing and some environmental factors. IEC stated in their Voice Quality in Converging Telephony and IP Networks tutorials that Noise corrupts and distorts speech reproduced at VoIP terminals. Other factors that impair voice clarity are Packet loss and codec processing. Packet loss in itself is often caused by congestion in a network where voice packets have not been given priority. In such an environment when traffic reaches the limit that output interface buffers can handle, the overflow traffic starts to get dropped and VoIP traffic are also dropped. This causes receiver of a voice signal or conversation to loose some of the caller's spoken words. In other to eradicate this, adequate quality of service measures should be implemented and VoIP traffic should be prioritized over other business critical traffic. Also, adequate measures should be taken while choosing codecs for parts of the network. This is because increase in codec compression ratio results in higher delay and thereby affecting clarity.

3.2 VoIP Quality Assessment

Over the decade there has been intense analytical study on the best ways of measuring VoIP Quality. As such, there is countless number of measurement tools both commercially and open source ones. Humans have the capacity to project their voices, producing sounds in the region of 100Hz and 10000Hz. This section of the project provides inside to the commonly used assessment methodologies for VoIP quality in the commercial and open source society.

3.2.1 Subjective Assessment

International Telecommunication Union (ITU) released documentation in 1996 titled Methods for Subjective Determination of Transmission Quality. Subjects were selected randomly from a mix of normal telephone users who have not participated in any subjective tests for 6 months before the time of the test. This document showcased one of the most basic and oldest forms of assessing voice quality [22]. The ITU-T P.800 documentation describes methodologies and procedural measures for conducting subjective evaluations of transmission quality; it also provides a five level grade judgment scale for sections of listening quality, listening effort and loudness preference. This grading makes use of the Mean Opinion Score (MOS). Table 4 below shows the scale which forms subjective assessment, where score 1 is the worst score and score 5 is the best score in each section.

Table 4. Mean Opinion Score Judgement Grading Scale Table (from[21])


Listening Quality

Listening Effort

Loudness Preference



Complete relaxation; no effort required

Much louder than preferred



Attention required; no appreciable effort

Louder than preferred



Moderate effort required




Considerable effort required

Quieter than preferred



No meaning understood

Much quieter than preferred

Although highly subjective, MOS grading has formed the basics and fundamental core of voice quality assessment methodology over which all other measurement scales are based.

3.2.1 Objective Assessment

MOS as a grading tool is impracticable and also not a cost effective way of assessing the voice quality within the test bed of this project. One would have employed the services of subjects (people) who have to meet the requirement set by ITU in their P.800 recommendation, and then find an environment which will also meet the noise to signal ratio as stated in the P.800 documentation. This has proved to be complex and expensive, so an alternative to Subjective assessment is the Objective assessment methodology. ITU released the P.862 recommendation titled “Perceptive Evaluation of Speech Quality (PESQ), and also the P.563 recommendation to tackle prohibitive nature of subject analysis by providing an alternative way of measuring voice quality. All these recommendations highlight digital way of measuring voice quality as opposed to the human perception which the subjective assessment method is based. The good point of this objective methodology is that it can map directly to MOS-LQO (Listening Quality Objective) score grade scale. The objective methodology in itself can be divided into intrusive and non intrusive methods. The non-intrusive assessment unit requires only one input which happens to be output speech through the network which would have been impaired to produce the P.563 MOS-LQO grade scale. While the intrusive assessment requires both the input speech into the network and the output speech out of the network into the objective assessment system to produce the PESQ MOS-LQO scales.

3.2.1 E-model Computational Modelling

The International Telecommunication Union gave an algorithm to predictably model voice quality. A methodology that is free of the objectiveness of PESQ or P.563; and also free of the subjective characteristics of the MOS grading system. The problem of the subjective tests lies in cost and expansive nature of such tests, the problem with the objective tests lays inadequacies to cater for scalability in design architecture. Because of the shier volume of data and result required for a test bed to analyse the impact of network impairments on VoIP quality, the E-model comes handy as a predictive mathematical and analytical representation. The ITU-T G.107 recommendation titled “The E-model, a Computational Model for Use in Transmission Planning” states that “The E-model has proven useful as a transmission planning tool, for assessing the combined effects of variations in several transmission parameters that affect conversational quality of 3.1 kHz handset telephony” [23]. Predictive in nature, the E-model's calculation considers the Psychological impact of each impairment factors and results in a single representative factor called the rating factor. The rating factor R is a cumulative combination of all transmission parameters relevant for the path that voice packets travel.

The rating factor equation is shown below (from [23]):



Ro basic signal-to-noise ratio,

Is sum of all impairments simultaneous to the voice signal,

Id impairments caused by delay,

Ie impairments caused by equipment's low bit rate codecs,

A impairments from other advantages of user access

This project exploits the Id factor component of the rating factor R, which represent impairments due to delay of voice signals as they travel from source to destination. The Id factor is a sum of 3 subdivided factors, Idte, Tdle and Idd

The Id factor equation is shown below:



Idte represents the Talker Echo impairment,

The Idte subfactor equations are shown below:


Idle represents the listener impairments echo,

The Idle subfactor equation is shown below:


Idd represents the impairments resulting from too long absolute delay Ta

The Idd subfactor equations are shown below:

For Ta < 100 ms:


For Ta > 100 ms:





Ta represents the absolute delay.

The rating factor R is then obtained after all the equation elements have been computed, and then the predictive result is compared or converted to an equivalent MOS subjective score which aids in predicting user's perception of voice quality. The equation for the conversion as given by International Telecommunications Union in their G.107 recommendation documentation is shown below:

For R < 0:


For 0 < R < 100:


For R > 100:



This chapter presents a test bed that was designed to utilise H.323 signalling protocol for voice communication. One of the leading voice technology products' providers in the industry is Cisco Systems. Cisco Systems have a rich suite of voice and unified communications products. Traditionally a routing and switching technology provider, Cisco acquired a company called Selsius Systems, Inc. in November, 1998 for $145 million mainly because of the brilliant Call Manager Software that Selsius developed [24].

The design in this mirror was developed to mirror a communication channel between two endpoints which are H.323 compatible. Two Cisco Systems Unified Communications devices were used; Cisco 7960 IP Phones and Cisco Unified Communications Manager 7.0 server. The test bed's design with configurations is scalable and can be adapted to scale to any level and was developed for simplicity's sake. The next section of this chapter describes various components used to setup the test bed. 21 shows the logical layout of the components used for the test bed.

4.1 Test bed Components

The test bed components I used can be mapped out with elements of the H.323 protocol standard. This section will highlight various VoIP endpoint devices, network traffic simulator software used, network switch and the Linux server hardware used to experiment for the purpose of this project.

4.1.1 IP Phone

VoIP users need some form of interface to the technology. This could either be smart devices like mobile cellular or PDA; it also could be an IP Phone, desktop PC or laptop. But for the purpose of this project, an IP Phone is used to generate and terminate voice traffic. Commercial IP Phones was used as shown in 22 below:

The Cisco IP Phone 7940/7960 is a full feature VoIP endpoint which delivers voice communication service to users in a cooperate environment. The phone just like a traditional analogue phone, allows dialling and receiving phone calls. According to Cisco Systems, this phone supports features like call forwarding, redialling, speed dialling, transferring calls, audio voice conferencing, and voicemail access. The phone also supports Power over Ethernet (PoE), Pulse Code Modulation codec (G.711) and Conjugate Structure Algebraic Code Excited Linear Predication codec (G.729), The user guide for Cisco 7960 can be obtained from [26]. Appendix A explores the web interface where statistical information can be obtained from this IP Phone [27].

4.1.2 Cisco Unified Communications Manager 7.0

Cisco Unified Communications Manager is a call agent application that resides on a server that provides call processing functions for various VoIP services, but only specific areas of interest were explored to deliver this test bed: directory services, codec application, dial patterns and the music on hold service. A Cisco 7960 IP Phone was setup with directory number 4001 which is assigned to the line one on the Phone. Another directory number 4002 is given to another IP Phone for just initiating the voice call. Once the call is initiated, the 4002 numbered line on the second IP Phone then place the 4001 caller on Hold; this is what initiates the transfer of the music on hold music file to the 4001 caller. Audio files were subsequently uploaded unto the call unified communications manager application. Sample wave files were obtained from website [28].

4.1.4 Network Switch

A Switch is a network device which resides on the layer 2 of the OSI Model architecture. A switch breaks up collision domain and provides connectivity to multiple devices on the same broadcast domain. A Cisco 2960 Switch with 24 FastEthernet ports was used for the purpose of this project; this switch provides network connectivity to the IP Phone, the packet sniffer tool, the NETEM application server and the Cisco Unified Communications Manager popularly known as CallManager. Port mirroring feature of the switch enables traffic to and from the IP Phone as well as the CallManager application to be monitored. Port 20 was used to mirror all the monitored ports, this helps simply capturing the traffic on this test bed. All network and configuration managements were controlled using Cisco CLI interface through the telnet application that runs on the Switch. The most current release of Cisco Unified Communications Manager software and documentation can be obtained from Cisco System's Product/Technology Support website from [29].

4.1.5 Linux NETEM

NETEM Emulator is an open source application that runs on a Linux operating system; it is widely used to emulate network properties of both LAN and WAN infrastructures. This tool was written by Stephen Hemminger at OSDL. This tool helps to emulate impairment issues in between the IP Phone and the Call Unified Communications Manager. This application delivers near real life experience for characteristics like bandwidth congestion, delay, jitter, loss of packet and latency amongst other things [30]. The NETEM tool is part of Linux kernel distribution version 2.6 and is controlled by using the tool's command “tc”. CentOS 5.2 Linux version with kernel 2.6 was installed on a Dell PowerEdge 2650 server with 3GB RAM, dual core CPU and 2 Network interface cards running 100Mbps.

Delay Control: This refers to the time it takes a packet to flow from one node on the network to another node. The NETEM software helps to control the latency with resolution in multiples of 10ms. Jitter was also simulated because variable delays do occur in real life network environment. For example if delay is 100ms and a jitter of 20ms is set, then a variable delay in the region of 100ms ± 20ms is added.

Packet Loss Simulation: Sometimes, RTP packets don't receive priorities over other business traffic in the corporate world. In such scenario, during periods of congestion (for instance, someone is downloading a huge file over a slow speed link or server backup jobs running over slow links), when interface buffers get filled, packets will most likely be dropped. So NETEM will help to simulate drops in packet and then results of impact on voice packets will be documented.

Packet Duplication: Sometimes due to faulty buffers or other components in network hardwares, packets get duplicated and receiver receives multiple copies of such packets, NETEM also helps to simulate this so as to tell how this impacts Voice streams. All the testing work on the NETEM is carried out using the command line interface [30].

4.2 Network IP Address Assignment and Routing

An IP address identifies components on a network. The IP Phone is assigned an address within the same subnet as NIC1 of the NETEM server. So also, the Cisco Unified Communications Manager is assigned an IP address within the same subnet of the NIC2 of the NETEM server. The NETEM server connects to two different subnets and hence acts as a router, routing traffic from one subnet to another. Table 5 below shows how IP Addresses are assigned within the test bed:

Table 5. Address Space for Testbed Components

SIP Request


Cisco UCM 7.0



Cisco IP Phone A

Cisco IP Phone B

Also, routing is enabled on the NETEM emulator through enabling packet forwarding at the kernel level of the Linux operating system. This is accomplished by editing the sysctl.conf file under the main /etc configuration directory. The file has to contain the net.ipv4.ip_forward variable which must be set to 1, so also the net.ipv4.ip_dynaddr variable must be set to 1 as well. CentOS documentation can be explored from [31] on how to deploy, install, and administer networking element of the CentOS 5.2 server.

4.3 Data Collection Softwares

This project makes use of a combination of open source and commercial tools for information gathering and statistical analysis. Wireshark is the open source sniffer tool used to monitor traffic flow from the IP Phone to the Cisco Unified Communications Manager Server. The Cisco IP Phone Web application comes in built with the commercial phone and shows statistical information on audio streams as they terminate on the phone.

4.3.1 Wireshark

Wireshark is a free open source packet analysing tool formerly named Ethereal but was renamed in May 2006 due to some trademark court case. Software can be downloaded from [32]. This software can transform an ordinary NIC into a protocol analyser and monitor. 24 shows an example of captured traffic from the test bed off the IP Phone network port. The top half of the 23 shows various packets listed in order of time detected on the network card. Detailed view of each packet can be obtained in the lower half of the screenshot by clicking on individual packets, so also is Hexadecimal content from each packet. This expert view was very helping in gathering informing within the testbed and proved very resourceful while analysing the voice traffic.

Wireshark has features to filter and see expert view of VoIP control and data stream information. It has in built player for playing back rtp captured streams. 24 shows an example of a captured file when replayed and also H.323 timeline expert view of a call within the test bed. Wireshark audio player has variable jitter buffer setting of 30ms is used to decode the captured audio messages from Wireshark, although Wireshark has a jitter buffer range of 0 to 500ms.

4.3.2 Cisco Call Statistics Web Page

The Cisco 7960 IP Phone has in built web server running on it. This web server delivers a webpage which can be accessed through a PC's web browser. The web page shows voice quality statistics compiled during the audio voice calls. Appendix A shows information that can be obtained from this webpage and Cisco's 7960 documentation page describes the element in details. This documentation defines three key voice quality metrics: the MOS-LQK, the concealed seconds and the concealment ratio. A concealment frame is inserted to mask frame loss events during audio streaming to the phone, however, this project focused on the MOS-LQK statistics obtainable from the phone's statistics web page. Cisco released a proprietary algorithm that uses statistical information within an eight seconds window to compute the MOS-LQK and David Manka stated in his work that this assessment is consistent with ITU's P.VTQ objective metric standard.

4.4 Summary

In this chapter, a testbed design for objective voice quality assessment was documented. NETEM which emulates various network impairments was also introduced to simulate delay, jitter, packet loss, packet duplication etc. Finally, data collection and capture tool was shown and then Cisco's web page which gives MOS-LQK statistical information used in this testbed experiments.

Chapter 5: Results and Discussion
5.1 Metrics Used

The test bed results reported in this project were obtained through the analysis of about 15 hours worth of voice traffic transmission through the NETEM emulation tool that sits between the Cisco unified communications manager and Cisco 7960 IP phones. Tests were done using 8 seconds data collection information given by Cisco's web statistics page. These MOS-LQK statistics were then sent to MATLAB for further analysis and graphing. 25 below shows the typical process which voice traffic goes through from call establishment phase to the MOS-LQK and packet loss ratio calculation.

The experimental statistics of interest was the bit error rate (BER), MOS-LQK and the packet loss ratio. In telecommunications system, the BER is used to evaluate the amount of bits received over the transmitted ones. This gives an idea of the lost bits of information through the impaired network test bed.

Formula for BER is given as:


In his project Tiantioukas established a conceptual technique which applied BER to form an objective analytical result. This technique has a formula for calculating recognisable speech pattern through the Dragon Naturally speaking software.

Formula for Remaining Speech (from [33]):


The final result, the MOS-LQK is obtained from the web page of the Cisco 7960 IP phone's terminal used at the end of running each test.

5.2 Testbed Results

In this section, details of the results obtained during experiment is given, the bit error ratio settings were obtained from analysing the packet loss obtained during the course of the test, so also is the MOS-LQK value. Some of the test components include:

Sample Wav Files: United Kingdom Male, United Kingdom Female, North American Male, North America Female.

Codecs used: G.711

5.2.1 MOS-LQK Results

MOS-LQK was developed by Cisco Systems and the testbed IP Phones have version 0.95 on them which is compliant with ITU specification. The MOS-LQK according to Cisco Systems shows the last eight seconds average value which gives a metric to measure the voice quality on the IP Phones. The data is obtained by constantly refreshing the webpage of the IP Phones and also by press “Settings” key on the Phone, and then number “5” for Status, and then number “5” again to show Call Statistics. The results from the G.711 codec transmissions are shown in 26 below. These results are based on 15 Monte Carlo Runs.

Due to the nature of the powerful server used, there was enough de-jitter buffer to cater for fixed delay. Hence, fixed delay didn't really impair voice quality over a long period. So, a jitter of 150ms was introduced to the test bed at this stage. This caused massive impact on voice quality. The higher the variable delay, the lower the quality of voice transmission as identified from reading the MOS-LQK results on Cisco IP 7960 Phones.

5.2.2 Packet Loss Results

Packet Loss also caused severed drop in voice quality during test bed experiment. The Concealment ratio was increasing rapidly on the Cisco IP Phones, this caused them to mask missing packet. Drops in voice quality were more rapid from 0.1% to about 2% drops in packet and much less steeper as packet loss increases from 5 to 25%. Actually, data between 2% and 4.9% was skipped for easier analysis and readability. The result of the packet loss impairment is show in 27 below about 10 Montecarlo runs was carried out between 0.2% and 2% packet loss, then about 5 Montecarlo runs was carried out between 5% and 25% packet loss.

Similarly to the effect of voice signal degradation caused by variable delay, the packet loss character also reduces the quality of voice during test bed experiments.

Packet Duplication was also tested within the test bed; however, voice quality was not noticeably impaired by this. It seems that VoIP terminals didn't care for much how much of these packets they receive at a time, unlike some other business sensitive applications.

5.2.3 Packet Corruption Results

Packet Corruption experiments were also carried, this is because voice transmission can be sensitive to random environmental noise. This experiment helped to put random offsets within voice over IP packets. It is noticed that packet corruption creates a similar effects' pattern on voice quality when compared to the results from the packet loss graph. The effect of the packet corruption impairment on various voice qualities is shown in 28 below.

The QoE matrix obtained during the course of this project is shown below in Table 6 below:

Table 6. MOS-LQK QoE Matrix Obtained during test bed Experiment


Delay(ms) + 150ms Jitter

UK Male

North American Male

UK Female

North American Female

Packet Loss (%)

UK Male

North American Male

UK Female

North American Female























































































































































Packet Corruption (%)

UK Male

North American Male

UK Female

North American Female












































































5.2 Summary and Evaluation

This chapter showed the results obtained during this Msc project's experiments for objective assessment of how network impairments affect VoIP quality. This test bed has its limitations, one being inability to produce a MOS score as that will be very expensive and time consuming for the Msc project. A series of calls were made from one Cisco 7960 IP Phone to another, and then music on hold streaming invoked by pressing the “Hold” button template during the call. MOS-LQK informational statistics is then obtained from both the IP Phones' webpage and also the screen of the phones themselves. MATLAB was then used to compile and graph results. This graphical representation of the test bed result showed in simple terms how some of the identified networks impairments get to affect VoIP traffic and conversations.

Future test bed design and experimental work will need deeper research and analysis. The test bed used Linux NETEM and also Cisco Unified Communications Manager version 7.0. Although G.711 codec was used during the experiments, but G.729 could also be used because it's the preferred choice of codec over wide area networks. The test bed design concept could be expanded and scaled to fit the goals and objectives of future research work to provide exploration of how network impairments like delay, jitter, packet duplication, packet corruption and packet loos can affect VoIP transmissions.

Chapter 6: Conclusion and Further Work

This Msc project report investigated protocols and recommendations which are adapted to designing VoIP solutions. Cisco proprietary Skinny protocol SCCP was used to transmit voice packets from the Cisco Unified Communications Manager Server, which provides call processor services and music on hold features in a network subnet through a Linux router/NETEM to the IP Phones subnet which houses Cisco 7960 IP Phones. Wireshark was used to capture, evaluate and analyse traffic within the test bed. Voice quality metrics like MOS-LQK which was designed by Cisco Systems were obtained during the experiment by transmitting 4 test files respectively to the IP Phone terminals.

These experimental results were in tandem with ITU-T subjective, objective and predictive models that most quality assurance software developers use. Experiments also simulated network impairments using NETEM tool which is a network emulation tool that is built to Linux operating system kernel. The Linux server was cond as a router by enabling IP packets forwarding from one network card to the other.

6.1 Contributions

This Msc Project achieved its stated objectives in Chapter 1 by delivering a simulated test bed of packet impairments that included corruption, duplication, reordering as a result of loss, jitter, fixed and variable delays. This project also delivered understanding of how these packet impairments affect Voice quality within the network infrastructure by using MATLAB to graph test bed results. A QoE matrix was also conceptualised to include MOS-LQK scores and some packet impairment variations like jitter, delay, packet loss and corruption.

The simplicity of the test bed design provides for its adaptability and usability for other future research work. The call processing application used and network IP addressing format gives opportunity for any future research to scale. The topology could also be easily setup for demonstrations and any alterations protect efforts and time put into setting it up. The tested successfully simulated controlled impairments and analysis of voice over IP traffic quality. The test bed is cost effective to implement and easy to setup to deliver objective voice quality measurements. The test bed experiment benefited from the usage of some open source applications like Wireshark, Linux Operating System, and NETEM for network impairment simulation. Also, some commercial softwares were used, like the Cisco Unified Communications Manager version 7.0 was virtualised on a CentOS Linux operating system which runs free version of VMWARE Server software. Cisco 7960 IP Phones were used to initiate, terminate and hold calls. Observation was carried out, and network error issues were minimised during the course of the experiment.

6.2 Future Research Work

The next logical step of this research is to expand the QoE matrix which was modelled in Chapter 5. This study was based on observation of SCCP protocol which was used to transmit voice conversations from the Call Unified Communications Manager to the Cisco IP Phones. Future research works could expand this to implement H.323 and SIP protocol characteristics. Also, G.711 codec was used to code and decode voice packets, future research works could easily expand this to include the G.729 and G.728 codecs. This is because G.711 is mostly used in LAN environments due to its bandwidth requirements while G.729 is mostly used across wide area networks because it consumes relatively less bandwidth and still delivers high quality voice traffic.

This project also virtualised the call processing unit for easier accessibility both from work and home, and also in case there is requirement to demonstrate it in a conference or during project defence at the Queen's Mary University of London, future research works could directly install this call processing element on a physical server and not virtualised it. Also, future works could design a resilient infrastructure with some form of clustering and then use G.729 for inter-cluster communications. From network design's perspective, the test was simplified to include only two subnets, one Linux router, one Cisco Switch, two Cisco IP Phones and one Call processing unit. This could be scaled to a more robust setup to include over 10 IP Phones across 4 subnets with over 2 call processing units to form 2 different clusters.

Also, for future research considerations, captured wav files could be passed through a software recognition system to convert speech sound to text files. An example of commercial software that can do this is the Dragon Naturally Speaking Software. Ideally the software will have to be trained to near perfection, so that after network impairments, text files obtained from conversion process could be compared with original conversions. And then packet loss ratio can then be obtained by using the concept of remaining speech developed used by Tiantioukas in his work [33].


IP Phone Call Statistics

The testbed used Cisco 7960 IP Phones for the purpose of initiating and terminating voice calls. One can navigate to the web page of the phones by pointing the web browser to the IP address of the phone, and then clicking respective stream links under the Streaming Statistics section. 29 below shows the streaming page and some information that can be obtained. For further information on various call statistics on this page, Table 7 shows a part of the description table as obtained from [34]. This call statistics can be exported to Microsoft Excel for further analysis.

Table 7. Cisco IP Phone Voice Call Statistics Information (from [34])




Score that is an objective estimate of the mean opinion score (MOS) for listening quality (LQK) that rates from 5 (excellent) to 1(bad). This score is based on audible concealment events due to frame loss in the preceding 8-second interval of the voice stream.

Note The MOS LQK score can vary based on the type of codec that the CiscoUnified IP Phone uses.


Average MOS LQK score observed for the entire voice stream.


Lowest MOS LQK score observed from start of the voice stream.


Baseline or highest MOS LQK score observed from start of the voice stream.

These codecs provide the following maximum MOS LQK score under normal conditions with no frame loss:

•G.711 gives 4.5

•G.729 A /AB gives 3.7

MOS LQK Version

Version of the Cisco proprietary algorithm used to calculate MOS LQK scores.

Cmltve Conceal Ratio

Total number of concealment frames divided by total number of speech frames received from start of the voice stream.

Interval Conceal Ratio

Ratio of concealment frames to speech frames in preceding 3-second interval of active speech. If using voice activity detection (VAD), a longer interval might be required to accumulate 3 seconds of active speech.

Max Conceal Ratio

Highest interval concealment ratio from start of the voice stream.

Conceal Secs

Number of seconds that have concealment events (lost frames) from the start of the voice stream (includes severely concealed seconds).

Severely Conceal Secs

Number of seconds that have more than 5 percent concealment events (lost frames) from the start of the voice stream.


Cisco Unified Communications 7.0 Setup

This appendix shows various tasks undertaken during setup of Cisco Unified Communications Manager 7.0 setup. Cisco documentation website [29] should be consulted for further details and recent best practises. Select Products, then Voice and Unified Communication sub-list, then IP Telephony, then Call Control and finally, Cisco Unified Communications Manager link.

Login in to Cisco Unified Communications Manager Server 7.0 through the web page. A computer must have valid IP connectivity to the server to do this. When shown the webpage to login as in 30 below, type admin in the Username field and then the admin's password in the password field.

G.711 Codec is setup for communications between the Cisco Unified Communications Server 7.0 and the Cisco IP Phones. This can be accomplished by going into the Systems Menu, select “Region” button, then click the “+” sign to add new Region. In this testbed, London was added as a region to cover the 2 IP Phones used. G.711 was selected as audio codec of choice through the “Modify Relationship to other Regions” section. Then click Save Button. 31 below shows the page where this was setup in the testlab.

Music on Hold (MOH) was also setup; this helped to stream voice traffic from the call processing unit to the Cisco IP Phones. This service was enabled by pointing to the “Systems” menu, then selecting “Service Parameters”; in the Server drop down box, I selected the server which happens to be my testbed's unified communications manager server. Then under the Service parameter drop down box, I selected Cisco IP Voice Streaming Media App (Active). For this project's testbed, 711alaw was selected as codec of choice, then I clicked Save. From the “Media Resources” menu, I selected “Music on Hold Audio Source”, and then selected the “Add New” button to browse for wave samples to upload. I choose the next available stream number and then the respective source file and a source name to go with. Then I checked the Play continuously (repeat) option to keep the streams repeating, then clicked Save. 32 shows the ukmale, ukfemale, namale and nafemale streams that were uploaded to the server for testbed's experiment.

A device pool called HQ London DP was setup to contain 2 IP Phones, and setup for auto registration. The phones were setup for music on hold service by pointing to the Device main menu and selecting Phone button. Then clicking on Find button to list all available phones, then selected the first phone and then clicking on Line one. 32 below shows how 4001 was assigned to the first line of one of the Cisco IP Phones; and 33 shows how User Hold MOH Audio Source was assigned.


[1] Vinton G. Cerf, Robert E. Kahn (1974), A Protocol for Packet Network Intercommunication, IEEE Transactions on Communications, Vol. 22, No. 5, pp. 637-648

[2] Keating, Tom. Internet Phone Release 4,Computer Telephony Interaction Magazine. Retrieved October 7, 2009 from Tom Keating online blog on TCMnet Website:

[3] ITU-T H.323 Recommendation on Visual telephone systems and equipment for local area networks which provide a non-guaranteed quality of service. Retrieved October 7, 2009 from the International Telecommunication Union Website:

[4] JR, The 10 that Established VoIP (Part 1: VolcaTec). iLocus. Retrieved October 10, 2009 from the iLocus Website:

[5] Digium, Inc. What is Asterisk, Retrieved October 10, 2009 from the Digium's Asterisk Website:

[6] Wallingford, T., (2005) Switching to VoIP, Sebastopol, Carlifonia: O'Reilly Media Inc. (978-0596008680)

[7] Schulzrinne H., et al., (2003) RFC 3550 “RTP: A Transport Protocol for Real-Time Applications”, Internet Engineering Task Force.

[8] Andreasen F., Foster B., Cisco Systems, (2003) RFC 3435 “Media Gateway Control Protocol (MGCP) Version 1.0”, Internet Engineering Task Force.

[9] Wallace, K., (2008), Cisco Voice Over IP (CVOICE) (Authorized Self-Study Guide), Cisco Press, 3rd Edition

[10] The Internet Engineering Consortium, “H. 323”, Web ProForum Tutorials. Retrieved December 13, 2009, from The Internet Engineering Consortium Website:

[11] Rhys Haden, Data Network Resource on IP Telephony. Retrieved December 13, 2009, from Ryhs Haden Website:

[12] ITU-T H.323 Recommendation on “Packet-based Multimedia Communications Systems”. Retrieved October 7, 2009 from the International Telecommunication Union Website:

[13] D. Chairporn, (2002). “Performance Evaluation of Voice over Internet Protocol,” Master's thesis, Monterey California: Naval Postgraduate School.

[14] Cisco Systems, Inc. Call States Sent to SCCP Endpoints by Cisco CallManager Document ID: 69267. Updated March 07, 2006, Retrieved September 19, 2009 from Cisco Systems Website:

[15] Ramesh K, Salman A., (2005). Cisco IP Telephony: Planning, Design, Implementation, Operation, and Optimization, Cisco Press, ISBN 1-58705-157-5

[16] The Internet Engineering Consortium, “Voice Quality (VQ) in Converging Telephony and IP Networks”, Web ProForum Tutorials, Retrieved January 20 2010 from the Internet Engineering Consortium Website:

[17] ITU-T Recommendation on G.114, “One-way Transmission Time”, May 2003. Retrieved from International Telecommunication Union Website:!!PDF-E&type=items

[18] Cisco Systems, “Understanding Delay in Packet Voice Networks”, Document ID: 5125. Updated February 02, 2006, Retrieved January 15 2010 from Cisco Systems website:

[19] Freeman R. L., Telecommunication Systems Engineering, (2004), Hoboken, New Jersey: Wiley and Sons. (978-0-471-45133-4)

[20] ITU-T Recommendation on G.131, (2003), “Talker Echo and Its Control”.

[21] Manka D., (2007) “Voice over Internet Protocol TestBed Design for non-intrusive, Objective Voice Quality Assessment” Master's thesis, Monterey California: Naval Postgraduate School.

[22] ITU-T Recommendation P.800, (1996), “Methods for subjective determination of transmission quality”.

[23] ITU-T Recommendation G.107, (2005), “The E-model, a Computational Model for Use in Transmission Planning”.

[24] Cisco Systems Newsroom, “Cisco to Acquire Selsius Systems, Inc. for $145 Million”, Retrieved January 20, 2010 from Cisco Systems Newsroom Website:

[25] “Cisco IP Phone 7960/7940 User Guide for SIP”, Retrieved January 14 2010 from Cisco Systems Website:

[26] “Cisco 7960 IP Phone User Guide”, Retrieved January 14 2010 from Cisco Systems Website:

[27] “Cisco 1760 MAR Documentation Data Sheet”. Retrieved January 15 2010 from Cisco Systems Website:

[28] NextUp Technologies, LLC. “Sample voice wav files”, Retrieved December 11, 2009 from NextUp Technologies Website:

[29] Cisco System Product/Technology Support Website, Retrieved December 12, 2009 from Cisco Systems Website:

[30] NETEM, Linux Foundation Group. Retrieved October 24 2009 from Linux Foundation Website:

[31] CentOS Linux Operating System, Retrieved December 20 2009 from CentOS Project Website:

[32] Wireshark, “Network protocol analyzer,” Retrieved December 24 2009 from Wireshark Foundation Website:

[33] Tiantioukas N., (2007) “Effects of the Wireless Channel, Signal Compression and Network Architecture on Speech Quality in VoIP Networks,” Master's thesis, Monterey California: Naval Postgraduate School.

[34] Cisco Systems, “Monitoring the Cisco Unified IP Phone Remotely”, Cisco Unified IP Phones 7960G/7940G Administration Guide for Cisco Unified Communications Manager 7.0 (SCCP), Retrieved Jan 17 2010 from Cisco Systems Website:

[35] Greene N., Ramalho M., et al., (2000) RFC 2805 “Media Gateway Control Protocol Architecture and Requirements”, Internet Engineering Task Force.

[36] Cisco Systems, “Understanding H.323 Gatekeepers”, Document ID: 5244. Updated July 20, 2006, Retrieved January 12 2010 from Cisco Systems website: