DDDU/DDDU/ Vol 2 No 7 July 1995/ Feature

Protocols for Internet Radio

by Robert Burcham

In addition to text and images, the Internet is increasingly being used to transmit audio, including FM radio broadcasts. Consequently, low-power radio stations such as the University of Kansas' 100-watt KJHK FM station can now reach listeners worldwide. To learn how to hear KJHK over the Internet, direct your World Wide Web browser to http://www.ukans.edu/~kjhknet.

At the heart of systems such as KJHK's is ``CU-SeeMe,'' a freely available videoconferencing program. Although originally developed by the Advanced Technologies and Planning group at Cornell University, CU-SeeMe is currently managed by the CU-SeeMe Consortium, a group of universities and nonprofit and private organizations. (For more information on the Consortium, contact Martyne Hallgren, Executive Director, at m.hallgren@cornell.edu.)

CU-SeeMe runs on both Macintosh and Windows platforms using an IP network connection. To use CU-SeeMe in its most basic configuration, you simply connect to another person/computer also running the program. To communicate with more than one person/computer, you set up a Reflector. This is the approach we've taken with the KJHK project. Our Reflector is configured so that only one client (identified by IP address) has send privileges, while all others are forced into a nonsending or ``lurking'' state. The sending client provides the ``signal,'' a VAT stream encoded using Intel DVI 32-KB/s format. The result is a 4-KB/s stream sent out to each attached listener.

Although functional, the current configuration of KJHK's Reflector is not very Internet friendly. Reflector uses User Datagram Protocol (UDP) to encapsulate its data, and routes the resulting packet using IP.

Unlike TCP, UDP has no mechanism to ensure delivery of any particular packet, nor is there any mechanism to ensure packets will be received in the correct order. This is satisfactory for most networks, which operate with moderate congestion levels. However once congestion becomes a factor, UDP packets may arrive at their destinations out of sequence or may not arrive at all. Where real-time services (such as broadcasts of KU basketball games) are concerned, this is unacceptable. Video will freeze and stutter, and audio will be spotty at best. So if assurance of delivery and sequence are important, why not use TCP? TCP requires an acknowledgment for just about everything it does. By the time you realize you have dropped a packet with TCP and then resend it, a full round trip has occurred. This is enough delay to impose a noticeable lag in real-time service, especially audio. If the recovery of lost packets is a liability, there is no reason to maintain the TCP overhead necessary for that purpose. By using UDP, the packet header is reduced from 40 to 8 bytes.

The problem of service quality is minor compared to the problem of efficient use of bandwidth. Reflector running with a basic configuration is unicast, meaning that for each individual connected to KJHK's Reflector, a separate copy of the 4-KB/s stream is sent out across the network. This could become a serious problem if too many users are simultaneously attached. To guard against this, Reflector is configurable to impose a limit on the number of concurrent participants. However, this is still inefficient: Consider the case where there are multiple listeners on the same LAN, perhaps in California. Each listener receives a duplicate copy of each UDP datagram. All points between the LAN and Reflector must carry three copies of the same datagram. Ideally, only one copy need be sent to the appropriate router on the LAN, at which point the router would send duplicate copies to the interested parties, thus relieving the interior points along the path of two-thirds of the burden. This is the idea behind IP multicast.

IP Multicast

The key concept in IP multicasting is that of the ``host group,'' a set of hosts that can be identified by a single IP destination address. Membership in a host group is dynamic, meaning that hosts may remove themselves from a group and new hosts may join a group at any time. Multicast datagrams are received at this single IP destination address and then forwarded to each member of the host group. The range of IP addresses from 224.0.0.0 to 239.255.255.255 (known as ``class D IP addresses'') are reserved for host groups.

A multicast datagram contains a time-to-live (ttl) field indicating how many ``multicast routers'' it will travel through before it is no longer forwarded. When the datagram is first sent, it reaches all members of the targeted host group that are on the sending host's LAN (for example, via local multicast on Ethernet). If the ttl is greater than one, any multicast router attached to the LAN will decrement the ttl and forward the packet to any other network with a host that is a member of the targeted host group. When the datagram reaches a multicast router on such a network, the multicast router distributes the datagram to each member of the host group on the LAN with a local multicast.

A multicast router is typically a UNIX workstation with multicast extensions resident in the operating-system kernel. These workstations run the ``mrouted'' multicast routing daemon. Multicast routers forward IP multicast datagrams to one another via ``IP tunneling,'' a method that encapsulates the IP multicast datagram inside an IP unicast packet with the protocol field set to 4. Note that some implementations of true multicast routers - John Moy's MOSPF Proteon router, for example - can send single copies of multicast packets without IP tunneling. If the host at the unicast destination IP address is a multicast router, the mrouted daemon receives the packet and strips off the encapsulating IP header, forwarding the result to the members of the host group as described earlier. By tunneling the multicast datagrams this way, the packets look just like unicast packets to normal routers. The result is the ``Multicast Backbone,'' or Mbone, a virtual multicast network built on top of the existing unicast Internet.

The Mbone is the result of the efforts of the Internet Engineering Task Force (IETF) to implement some means of transmitting the events of IETF meetings to destinations around the world. High-speed network interfaces and powerful workstations with audio and video capability make the Mbone possible. However, because the Mbone is still built on top of a packet network, the quality of real-time services is still difficult to guarantee. Protocols are emerging that address this problem and also anticipate the change from a packet-based network to a cell network.

Other Protocols

The Xpress Transfer Protocol (XTP) contains all of the basic functionality of TCP and UDP in one protocol, and also offers other functionality.

For instance, XTP has better connection management than TCP. The hands of a system designer are tied in TCP by all of the acknowledgments that occur during packet exchange (two to open and acknowledge the connection, two to send and acknowledge the data, and two to close and acknowledge the close of the connection). XTP performs the same duties with three packets. One packet opens the connection and sends the data in the A - B direction. Another acknowledges the connection and sends data in the B - A direction, if needed. When the last packet is sent in the A - B direction, it is marked as such, thus closing the connection.

XTP also has selectable flow control and error control, so data can be transferred without acknowledgment, as in UDP. Additionally, a number of XTP features show potential for real-time applications. XTP has multicast support, and built-in multicast group management. XTP can assign priority to data on a system-wide basis, thus ensuring a cap on maximum latency for time-critical data. If a receiver is being overwhelmed by senders, it can impose a data-rate and burst-size limit upon the sender(s) through an XTP acknowledgment. Perhaps its most appealing feature is that because XTP is purely a transfer protocol, it can operate over any network layer, and can easily run on top of AAL3/4 or AAL5.

Another protocol designed for real-time applications is the Real-time Transfer Protocol designed in part by Steven Casner and Van Jacobson of the Audio Video Transport Working Group (March 1995). RTP was designed to provide transport of real-time data over a packet network, namely IP. As such, it is not readily portable between other protocol layers, but in conjunction with the RTCP control protocol, it does appear to be the best bet for real-time services over IP.

RTP provides support for sequence reconstruction, content identification, security, and loss detection. RTP enables source identification, support for gateways and multicast-to-unicast translators, thus providing support for real-time services for participants who may be scattered across a heterogeneous WAN.

Some key fields of the RTP header are defined as:

V, the version (currently 2).
P, a padding bit, set to 1 if the payload contains padding octets.
PT, the payload type, which identifies the type of data in the payload and determines how the application will handle the data.
sequence number, which is incremented once for every packet sent and is used to detect packet loss and restore packet sequence.
time stamp, the sampling instant of the first octet in the data packet (used for synchronization and ``jitter'' calculations).
SSRC, the synchronization-source identifier, which is randomly chosen with the intent that no two SSRCs are identical within the same RTP session.
CSRC, the contributing-source identifier, which lists the contributing sources for this payload.
CC, which lists the total number of sources, although only the first 15 sources are actually listed.

The RTP control packets (RTCP packets) are periodically sent to all participants in the current session. RTCP information is intended to provide feedback on the quality of the data distribution, but this type of information does not play a role in improving quality of service on the fly. It is typically recorded during the session and analyzed afterward to discover network problems. Each RTCP packet carries a transport-level identifier for an RTP source called the ``canonical name.'' As noted, source information is also stored in the CSRC list of the RTP header, but if a conflict occurs or a client crashes, receivers can rely on the CNAME to keep track of current participants. RTCP packets are sent to each participant individually so that all participants may monitor the rate at which RTP packets are sent. This is helpful, because as the number of participants grows, current participants can gauge their level of participation accordingly, based on their network capabilities.

Rob is a graduate student at the University of Kansas and can be contacted at burchamr@lark.cc.ukans.edu.