Sign In

Communications of the ACM

Communications of the ACM

Voice Over IP

View as: Print Mobile App ACM Digital Library Full Text (PDF) Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook

Voice has been transmitted over the public-switched telephone network (PSTN) since 1878 while the U.S. long-distance market has grown to about $100 billion a year in business and residential demand. The desire of businesses and consumers alike to reduce this cost, along with the investment over the last decade in IP-based networks, public and private, has produced substantial interest in transmitting voice over IP networks. The possible re-emergence of Internet service providers (ISPs) and others as Internet telephony service providers (ITSPs) is likely to further increase competition among all phone service providers. Many communication technology vendors are rolling out hybrid IP/PBX systems. Both traditional and recently established carriers are beginning to offer voice over IP (VoIP) network connectivity to both business and residential customers (see the sidebar "PC-to-Phone Providers).

VoIP involves sending voice transmissions as data packets using the Internet Protocol (IP), whereby the user's voice is converted into a digital signal, compressed, and broken down into a series of packets. The packets are then transported over private or public IP networks and reassembled and decoded on the receiving side (see Figure 1). Residential customers can connect to IP-based networks by using the local loop from the PSTN or high-speed lines, including ADSL/DSL and cable modems.

Several recent industry surveys and projections estimate that VoIP could account for over 10% of all voice calls in the U.S. by 2004. It's likely to be used first in places with significant IP infrastructure or where cost savings are significant; an example might be a company with multiple sites worldwide connected through a private or public IP network. However, VoIP deployment may not be possible everywhere, as some countries restrict the use of VoIP to prevent harming their monopolistic telecommunication markets. VoIP might also be suitable for highly distributed companies or for companies with seasonally variable voice-service demand.

The idea of VoIP, or voice over the Internet or IP telephony, has been discussed since at least the early 1970s [6] when the idea and technology were developed. Despite this history, VoIP didn't establish a commercial niche until the mid-1990s. This gradual commercial development can be attributed to a lack of IP infrastructure and the fact that circuit-switched calling was and still is a much more reliable alternative, especially in light of the poor quality of early VoIP calls. In 1995, Vocaltec ( produced the first commercially available VoIP product requiring both participants in the call to have the software on a PC as well as Internet access. Unfortunately, it did not allow traditional calls through the PSTN.

Following the rapid growth of the public mass-market Internet, especially the Web, during the early 1990s and accompanying investment in IP networking infrastructure by businesses, vendors, and carriers, VoIP has finally become a viable alternative to sending voice over the PSTN. A number of factors are influencing the adoption of VoIP technology. First and foremost, the cost of a packet-switched network for VoIP could be as much as half that of a traditional circuit-switched network (such as the PSTN) for voice transmission [9]. This cost saving is a result of the efficient use of bandwidth requiring fewer long-distance trunks between switches. The traditional circuit-switched networks, or the PSTN, have to dedicate a full-duplex 64Kbps channel for the duration of a single call. The VoIP networks require approximately 14Kbps, as voice compression is employed, and the bandwidth is used only when something has to be transmitted. More efficient use of bandwidth means more calls can be carried over a single link, without requiring the carrier to install new lines or further augment network capacity; Table 1 compares voice over PSTN and over IP.

Besides cost savings and improved network utilization, VoIP offers other features, including caller ID and call forwarding, that can be added to VoIP networks at little cost [5]. VoIP allows Internet access and voice traffic simultaneously over a single phone line. This function could eliminate the need for two phone lines in a home, one for data and one for voice, by using the same line to carry all traffic without concern for missed calls or being disconnected from the ISP. Other high-speed media, such as ADSL and cable modems, can be used to carry both data and voice to IP networks while letting home customers use regular phone lines for voice calls to and from the PSTN. In this way, VoIP service offered by ISPs and ITSPs might indirectly benefit existing telephone companies and cable providers by increasing the potential number of ADSL and cable modem subscribers nationwide.

Long-distance carriers in the U.S. pay an average of $0.0171 per minute in interstate access charges to the regional Bell operating companies, that is, the local phone companies [8], a total of $9 billion a year. One current VoIP cost advantage is that ISPs pay no access charges, due to a U.S. Federal Communications Commission exemption under enhanced-service-provider regulations. However, any changes in regulation requiring ISPs and ITSPs to pay access charges or treat calls to ISPs as long-distance calls may diminish the VoIP cost advantage.

One VoIP application might involve managing temporary overload call volume for business users. Using a regular PBX, most traffic can be serviced with existing telephony equipment, and any excess or overload traffic can be routed to an IP/PBX system that can then be serviced by remote call centers with IP infrastructure (see Figure 2).

Back to Top

Technical Issues

Among the many technical issues in VoIP, a major one is end-to-end delay, or latency. To ensure good voice quality, latency for voice communication should not exceed 200 milliseconds, as demonstrated in the 1980s when carriers tried to offer voice services over geosynchronous satellites; users deemed the 270-millisecond delay unacceptable. However, under certain circumstances, VoIP might suffer from more latency, leading to unacceptable quality (due to the uncertainty as to whether the other person is talking, possibly leading to interruptions). Latency is influenced by a number of variables. First, other traffic on IP networks directly affects the delay for voice packets. Another is packet size, with smaller packets receiving less end-to-end delay, due to faster routing and other factors, while increasing overhead on the system. Latency is also related to the number of routers and gateways that packets have to travel through before reaching their destinations. Table 2 outlines the four most common causes of packet delay over IP-based networks, public and private.

Some VoIP systems send test messages to several routers over IP networks to find the paths with better quality in terms of less delay. These smart techniques do not always yield better quality, especially over public IP networks like the Internet, due to possible rapid fluctuations in the amount of traffic and resulting increase in delays experienced by the people speaking and listening on the line.

VoIP systems use the User Datagram Protocol (UDP) as a transport layer protocol on top of IP to avoid acknowledgments for lost packets. Acknowledgments trigger undesirable retransmission of voice packets and increase network traffic (and end-to-end delay) and thus affect the quality of service (QoS) for VoIP. Some packet loss is tolerable; for example, many voice encoders can handle up to 1% packet loss [2].

For users who prefer traditional telephones, not specialized equipment, Internet telephony gateways can be used where two users communicate without having a computer at either of their locations. A gateway's basic architecture involves a user connection via the PSTN. The gateway computer then searches for another gateway computer near the target location and makes a connection using circuit switching. When this connection is made, the second gateway utilizes the local PSTN to complete the collection of the call. Though this type of call isn't completely IP, it does suggest possible future solutions for integrating the current PSTN and the VoIP system. Table 3 compares four implementations for supporting VoIP.

However, data packets traveling through the Internet may not be secure and may require encryption, adding overhead by increasing the necessary bit rate beyond 14Kbps, hence reducing the bit rate advantage of VoIP over PSTN. Encryption also increases the end-to-end latency caused by the processing delay for encryption and decryption.

Meanwhile, technology support for VoIP has begun to mature on a number of fronts. The newer generations of routers and switches are faster and better able to handle the added load of real-time data packets. Beyond the advances in compression and equipment, protocol support in the form of the Resource Reservation Protocol (RSVP) and IP version 6 (IPv6) are also starting to mature. These protocols offer ways to prioritize voice traffic over the Net, helping improve QoS, especially when the network is congested.

Back to Top

Protocol Support

Just as in conventional telephony, VoIP needs a connection between users, though in the case of VoIP, a virtual connection. VoIP architecture involves many components. First, a signaling protocol is needed to set up individual sessions for voice connections between users [2]. Once a session is established, a transport protocol can be used to send the data packets. Directory access protocols are another important part of VoIP, providing routing and switching information for connecting calls.

A signaling protocol handles user location, session establishment, session negotiation, call participant management, and feature invocation. Session establishment is invoked when a user is located, allowing the call recipient to accept, reject, or forward the call [6]. Session negotiation helps manage different types of media, such as voice and video, transmitted at the same time. Call participant management helps control which users are active on the call, allowing for the addition and subtraction of users. The signaling protocol also involves feature invocation, at which time call features, such as hold, transfer, and mute, are controlled.

The Realtime Transport Protocol (RTP) can be used to support the transport of real-time media, including voice traffic, over packet networks. RTP-formatted packets contain media information and a header, providing information to the receiver that allows the reordering of any out-of-sequence packets. Moreover, RTP uses payload identification to place an identifier in each packet to describe the encoding of the media so it can be changed in light of varying network conditions [7]. The Real Time Control Protocol (RTCP), a companion protocol for RTP, provides QoS feedback to the sending device, reporting on the receiver's quality of reception. The Real Time Streaming Protocol (RTSP) can be used to control stored media servers, or devices capable of playing and recording media from the server. This added RTSP-based control allows the integration of voice mail and prerecorded conference calls in VoIP environments. The ability to integrate these advanced services is important to the future growth of VoIP. The Session Initiation Protocol (SIP) can be used to establish, modify, and terminate multimedia calls.

To encourage rapid, widespread deployment of VoIP services, several standards bodies have generated agreements based on groups of existing protocols and standards. The two most important are the H.323 recommendation from the International Telecommunication Union and Media Gateway Control Protocol (MGCP) from a branch of the Internet Engineering Task Force. Neither is a standalone protocol but relies on other protocols to complete their jobs [1]. The H.323 architecture is based on four components: terminals, gateways, gatekeepers, and the multipoint control unit (MCU). Gateways are used for protocol conversion between IP and circuit-switched networks. Gatekeepers are used for bandwidth management, address translation, and call control. H.323 provides a foundation for audio, video, and data communications across IP-based networks, including the Internet. Complying with H.323 enables different multimedia products to interoperate. H.323 depends on other standards, such as H.245, to negotiate channel usage and capabilities, modified Q.931 for call signaling and call setup, Registration Admission Status for communicating with a gatekeeper, and RTP/RTCP for sequencing audio/video packets. The MCU supports multicast conferences among three or more end points by using H.245 negotiations to determine users' common capabilities [1].

The Media Gateway Control Protocol (MGCP) defines communications among call agents (media gateway controllers) and telephony gateways. Call agents have the intelligence for call control and other functions and manage telephony gateways used for protocol conversion. A call agent in MGCP is analogous to a gatekeeper in H.323 [1]. The MGCP can use the Session Initiation Protocol (SIP), which uses the HTTP format to allow a user to initiate a call to be initiated by clicking on a browser.

Although H.323 and the MGCP have been standardized by two different standard-setting bodies, some of their functions are quite similar. Both the gatekeeper in H.323 and the call agent in MGCP manage and control gateways and participate in setting up, maintaining, and terminating the VoIP's telephone connection. The MGCP can also be used as part of H.323 for simplified interworking.

Back to Top


The PSTN has served the needs of businesses and consumers worldwide for more than 100 years and has gone through major technological advances, including survivable long-distance networks based on synchronous optical network (SONET) rings, intelligent networking, Signaling System No. 7 (SS7)-based signaling, and a high degree of redundancy in telephone switches. All these increasingly advanced features and components have increased the reliability of the PSTN; it is estimated that because of them the PSTN is today operational 99.999% of the time. The PSTN also offers low latency rates and very high quality during voice transmission.

With the emerging potential of IP networks to provide integrated voice-data communications, conventional PSTN carriers realize they have to respond to this competitive threat. For example, Telcordia (formally Bellcore) has developed a Voice over Packet (VoP) architecture and has initiated an industrywide effort to develop generic requirement documents; they will allow local and interexchange carriers, vendors, and other stakeholders to address interoperability issues associated with networks, services, protocols, and equipment. These initiatives recognize that because bundled services cannot be offered cost-effectively by separate networks, they have to identify a migration path for PSTN carriers preserving their investment in circuit-switched technology and services. This migration path is supposed to allow PSTN carriers to modify and add only some components in existing networks for offering multiple services, including VoIP.

Telcordia's Next Generation Network and VoP architecture (NGN/VOP) represents a vision for the coexistence of these two technologies (see Figure 3) [3]. The current PSTN is controlled by SS7, an out-of-band packet-switched network used to coordinate the establishment, use, and termination of circuit-switched calls through circuit switches and trunks. The SS7 network also allows other services to be provided, including 800-number dialing and local number portability, as required by the U.S. Telecommunications Act of 1996 to foster competition in local telephone markets.

The NGN/VOP architecture involves a number of elements [3]:

  • Core packet network. Unlike the PSTN, this classical IP network carries both control and traffic packets.
  • Call connection agent (CCA). This software provides call-processing functionality. An IP network is a best-effort packet delivery service; something has to set up, manage, and disable virtual voice connections. Moreover, packets might be lost in error or arrive out of sequence. The routing and management of a virtual call across a core network is essential. The CCA also has to generate SS7 messages if 800 toll-free dialing, local number portability, and other services are desired.
  • Signaling gateway. This device is the control bridge between the circuit-switched and packet-switched worlds needed to manage end-to-end calls through both infrastructures.
  • Trunk gateway. This traffic bridge terminates circuit-switched trunks on the PSTN side and virtual connections on the packet-switched side.
  • Access gateway. This device provides alternative access for subscribers not traversing the PSTN. The access gateway sets up transport connections through the core network when directed by the CCA; it also provides ringing and other functions.
  • Billing agent. This agent gets raw usage data from the CCA and generates formatted messages for back-end billing platforms.

This architecture allows existing PSTN to evolve into a network supporting both traditional and IP-based voice communications. Though the phone companies serve more than 100 million U.S. subscribers today, they have to provide bundled services in the future if they hope to maintain or increase their existing client base. The fate of this evolutionary architecture depends on carriers being able to forge interoperability consensus among themselves and with vendors.

Back to Top

VoIP Adoption and Prospects

Several factors regarding the adoption of VoIP make it difficult to forecast adoption rates. The first deals with how quickly existing carriers might transition away from their current technology. Another deals with demand for services from emerging carriers and other service providers who are unencumbered by sunken investment in the PSTN. Another deals with the regulatory environment. And yet another deals with users who will undoubtedly demand not only the same high QoS to which they are accustomed but cost-effective bundled services as well.

Many users resist changing to VoIP until they are shown the new service's tangible benefits, including reduced cost or more features; they are certainly unlikely to accept lower quality. In addition, many organizations have invested a great deal of money in PBX and other phone equipment. The availability of new hybrid PBX/VoIP systems, which can be installed as old equipment is phased out might significantly influence the speed of VoIP adoption. The cost today of VoIP end-user equipment is much greater than for traditional phones. However, the emergence of devices that do not require a computer but connect to existing phones may help increase user acceptance (see

VoIP also has to address the issue of security for transmitted messages before it can become universal. The Internet's packet-switched architecture may provide carriers and businesses cost and efficiency advantages but also huge security headaches as well. Along with IPv6, many versions of VoIP software have built-in encryption, offering better security than older implementations. Table 4 lists several factors that could affect VoIP adoption.

These issues make it evident that VoIP will not completely eliminate but rather integrate with and work in parallel with the traditional established PSTN. Even though the two systems reflect quite different design philosophies and commercial histories as to their switching mechanisms, they also share some of the same technologies and links. For example, each system utilizes the local loop to reach the end user. Additionally, VoIP relies on the PSTN to enable its users to reach their ISPs and Internet gateway servers. The two systems are likely to coexist for the foreseeable future, each one serving a particular market or purpose. This competitive coexistence should continue until VoIP quality and reliability finally catches up to PSTN, and some of the older PSTN architecture becomes outdated and needs to be replaced. Figure 4 shows one possible scenario for PSTN and VoIP coexistence for customers.

The Cahners In-Stat group estimates that VoIP gateway sales will reach $4 billion in sales in the U.S. in 2003. As a harbinger of VoIP deployment, Cisco Systems has many business customers with more than 2,000 IP phones [4]. Moreover, many other small but technologically advanced companies are likely to install IP/PBX systems; Gartner Group predicts that 50% of all small companies will have IP/PBXs by 2004. One major factor influencing would-be commercial customers is the ability of vendors to offer large IP/PBX systems that match large PSTN switches in terms of cost, size, number of lines, reliability, and configurability. The last-mile issue can be resolved if carriers offer high-bandwidth service aggregation points at business customers' premises. Due to the perceived unmanageability of public IP networks, it's unlikely that most VoIP traffic will be carried by public IP networks in 2004. According to some estimates, it's likely to be less than 20% even by 2004 [1].

As VoIP gains a commercial foothold, wireless VoIP might emerge as a way to transmit voice over the Internet from cell and personal communications services (PCS) phones. This advance could affect cellular and PCS providers as their customers gain the option of connecting to IP networks for long-distance calls. Since the number of wireless customers is increasing exponentially and the cost of wireless long-distance service remains high, the effect of WVoIP on wireless carriers may be significant, despite the hurdles of QoS and reliability. Better loss algorithms and transmission equipment are also needed before WVoIP becomes an engineering and commercial reality.

Back to Top


Our aim here has been to provide background information, major concepts, and issues concerning the technology, deployment scenarios, and approaches to protocol support for VoIP. We've also addressed a number of unresolved engineering and marketing questions and how VoIP might coexist with the traditional voice infrastructure. Other areas where the technology of VoIP must be developed further before full or even substantial adoption is possible include billing and customer service; VoIP implementers must still determine the most effective billing structure for calls placed using VoIP systems and develop procedures and systems for implementing it (for more on VoIP, please see ~jain/refs/ref_voip.htm). Ultimately, services, QoS, and cost-effectiveness will determine the speed of VoIP adoption and evolution.

Back to Top


1. Black, U. Voice over IP. Prentice Hall, Upper Saddle River, NJ, 2000.

2. Goyal, P., Greenberg, A., Kalmanek, C., Marshall, W., Mishra, P., and Nortz, D. Integration of call signaling and resource management for IP telephony. IEEE Internet Comput. 3, 3 (May/June 1999), 4452.

3. Katzenberger, G., Ed. Telcordia Digest Tech. Info. SR-104 17, 2 (Feb. 2000); see downloads/feb2000digest.pdf.

4. News@Cisco; see

5. Polyzois, K., Purdy, H., Yang, P., Shrader, D., Shinnreick, H., and Schulzrinne, H. From pots to pans: A commentary on the evolution to Internet telephony. IEEE Internet Comput. 3, 3 (May/June 1999), 8391.

6. Schulzrinne, H. Service for telecom, Version II. IEEE Internet Comput. 3, 3 (May/June 1999), 4043.

7. Schulzrinne, H. and Rosenburg, J. Tutorial: The IETF Internet telephony architecture and protocols. IEEE Internet Comput. Online (1999); see

8. U.S. Federal Communications Commission, Industry Analysis Division, Common Carrier Bureau. Monitoring Report and Access Tariff Filings. Statistical Trends in Telephony. Washington, DC, Aug. 2001); see

9. Weiss, M. and Hwang, J. Internet Telephony or Circuit-Switched Telephony: Which is Cheaper? School of Information Science, University of Pittsburgh, Pittsburgh, PA, Sept. 1999; see

Back to Top


Upkar Varshney ( is an assistant professor in the Department of Computer Information Systems at Georgia State University, Atlanta, GA.

Andy Snow ( is an assistant professor in the Department of Computer Information Systems at Georgia State University, Atlanta, GA.

Matt McGivern ( is at Arthur Anderson, Atlanta, GA.

Christi Howard ( is an undergraduate student in the Department of Computer Information Systems at Georgia State University, Atlanta, GA.

Back to Top


F1Figure 1. A possible scenario for VolP for business customers.

F2Figure 2. Managing temporary overflow of calls using VolP.

F3Figure 3. Proposed evolution path for traditional PSTN carriers.

F4Figure 4. Possible coexistence scenario for PSTN and VolP.

Back to Top


T1Table 1. A qualitative comparison of voice over PSTN and over IP.

T2Table 2. The delay factors in VoIP.

T3Table 3. Some VoIP implementations.

T4Table 4. Factors affecting VoIP adoption.

Back to Top

©2002 ACM  0002-0782/02/0100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2002 ACM, Inc.


No entries found