Dr. Dobb's Journal November 2000
Simply put, security protocols can be performance assassins. In fact, a poorly implemented security protocol can not only cripple multimedia applications, but may also be vulnerable to intruders. In this article, I'll examine several multimedia security protocols, explain why they can devastate performance, and examine how they are actually deployed.
Conferencing applications, for example, facilitate the exchange of audio-visual and textual information among two or more users. This information is split between two data streams -- signaling and bearer. A signaling stream is used to control conference attributes such as notification of participant entry or exit. Popular signaling protocols include H.323, the Media Gateway Control Protocol (MGCP), and the Session Initiation Protocol (SIP). (For more information on these protocols, see my article "Internet Telephony Protocols" DDJ, July 1999.)
By contrast, the bearer stream is restricted to multimedia content and is usually transported over the Real Time Protocol (RTP) -- a lightweight protocol optimized for the real-time requirements of multimedia applications.
Because they are distributed programs, conferencing apps are vulnerable to identity theft (stealing services allocated to another person) and illegal monitoring of conversations. Consequently, these applications rely on authentication and encryption techniques to limit their exposure to intruders.
Authentication refers to actions necessary to verify a user's identity. Authentication is client/server oriented and relies on a shared secret(s) to validate a client's identity. This secret information is typically attached to an existing packet.
A few authentication algorithms are light enough that every packet can be authenticated without noticeably degrading performance. Others (such as public-key algorithms) are processor intensive and perform better when sporadically used. Besides protecting the user's identity, authentication enables network providers to bill users for network usage or provide enhanced functionality.
Unlike authentication routines, encryption algorithms modify the packet payload. Although most of these algorithms are not processor intensive, they can interfere with multimedia performance. To explain, audio and video streams must be fed enormous numbers of packets at specified time intervals. Encrypted packets steal CPU cycles and may result in dropped video frames or pops in audio playback.
Even though the timing requirements of conventional multimedia streams are strict, the performance criteria for authenticating devices and establishing a conference are significantly more intense. For example, the telephony industry expects that the dial tone be audible within 50 milliseconds (ms) of taking the phone off-hook. If you're using an intelligent endpoint (SIP or H.323, for instance), the device is responsible for generating a dial tone. By contrast, an MGCP endpoint must contact its Call Management Server (CMS) for instructions on how to react when the receiver is taken off the hook. Before it instructs the MTA to play dial tone, the CMS verifies that the MGCP device can access network resources.
Given the miniscule authentication time window, some CMSs use short cuts to boost authentication performance. For instance, they may use a proprietary mechanism to authenticate users rather than relying on a conventional Remote Authentication Dial-In User Service (RADIUS) server. By contrast, others avoid public-key authentication protocols and perform most authentication operations when the device boots.
Similarly, the audio-visual path between parties must be established within 300 ms of the destination party going off-hook, or portions of the conversation will be lost. Establishing a communications path includes reserving and committing network resources and enabling the flow of multimedia packets.
Even without security, it requires Herculean effort to execute these tasks in under 300 ms. If each packet must also be authenticated, compromises may be necessary. For instance some providers use over-provisioned, private networks that eliminate the need to reserve network resources. Others utilize custom hardware or high-end UNIX servers to reduce the processing burden of the CMS.
Because there are myriads of security solutions, signaling protocols should not be tied to specific security architectures or technologies. For example, SIP has chosen to adopt the HTTP security model. To explain, SIP defines rudimentary security mechanisms that can be used by any SIP client or server. If you need additional security features, you must combine SIP with another protocol.
SIP and HTTP have other attributes in common -- they are published by the Internet Engineering Task Force (IETF), are text based, and can dynamically support new features. Furthermore, since SIP's syntax is based on HTTP, it reuses the HTTP security primitives defined in RFC 2617 HTTP Authentication: Basic and Digest Access Authentication.
If a SIP server uses Basic authentication, SIP clients must supply a user ID and password with each request. If the required headers are not present, the server rejects the method with a "401 Unauthorized" or "407 ProxyAuthenticationRequired" response. The client recognizes the authentication error and replies with the user ID and password in an unencrypted header; see Figure 1. Alas, clear text communication of sensitive information limits the usefulness of Basic authentication.
Fortunately, RFC 2617 also supports the Digest security option; see http://www.ietf .org/ rfc/rfc2617.txt?number =2617. Like Basic, Digest authentication relies on a challenge/response mechanism to authenticate packets; see Figure 2. If a server is unhappy with the contents of a packet, it not only rejects it with a 401 or 407 error, but it attaches a nonce header to the response. The client hashes this nonce with information it shares with the server (such as user ID and password) and communicates the hash result in a retransmission response to the server. If the hash value matches the value calculated by the server, the server accepts the packet.
To maximize security, servers can challenge each packet. However, this doubles the number of packets required to accomplish tasks and makes it virtually impossible to meet the 300-ms call-setup requirements. Therefore, to optimize performance, SIP servers typically challenge the first packet they receive and randomly challenge subsequent packets (the exact ratio is application specific, see Table 1).
Although Digest is a dramatic improvement over Basic authentication, it is no panacea. For instance, it can never be as secure as a system that uses a client-side private key. Another vulnerability is its weakness to replay attacks. To explain, when servers choose to randomly challenge packets for performance reasons, an intruder could issue rogue commands and the server wouldn't detect these packets until the next challenge.
If you have more strict security requirements, SIP advocates recommend that you combine it with a low-level alternative such as the Internet Protocol Security (IPSEC), which encrypts packets during TCP/IP stack processing and can be initialized with either preshared or public keys.
The International Telecommunications Union (or ITU) philosophy with H.323 is diametrically opposed to the IETF approach with SIP. The designers of SIP prefer a minimalist approach that defines core features so that the protocol is light and easy to implement. Consequently, features such as robust security must be implemented with other standard protocols.
By contrast, the H.323 architects chose to explicitly enumerate every aspect of the protocol. This tradition continues with H.235, the protocol that defines how H.323 entities can perform authentication and ensure integrity and privacy.
H.235 enables authentication by validating the identity of an endpoint. Before authentication can be performed, the client and server may exchange keys via Diffie-Hellman, Oakley and ISAKMP, or an out-of-band mechanism.
Once the keys are established, the server (or Gatekeeper) must validate the endpoint (or client) during endpoint registration (see Figure 3). H.323 clients issue a Registration Request (RRQ) to the Gatekeeper (GK) to request permission to use its services. If the Gatekeeper is able to validate the client, it responds with a Registration Confirm (RCF). If the client cannot be authenticated, it is rejected via a Registration Reject (RRJ).
The integrity of an H.323 packet is assured by attaching a hashed checksum value (or token) to each packet (the remainder of the packet need not be encrypted). This hash algorithm is typically negotiated when the client issues a Gateway Request (GRQ) message. The Gatekeeper (or server) responds with a GCF that acknowledges the hash algorithm (see Figure 3). Because H.235 integrity algorithms are not challenge/response based, the potential performance impact is minimized and is less of a drain on multimedia throughput.
H.235 protects privacy with two packet encryption protocols: user or subsystem. User-level protocols are proprietary and are written by the application developer. By contrast, subsystem-level protocols (such as IPSEC) normally run in an operating-system kernel and usually have excellent interoperability with multiple vendors' products. Unfortunately, there may be a significant performance penalty with using protocols such as IPSEC (especially if the packet must be encrypted with a public-key algorithm).
While the security features in H.235 are appealing, it is an optional protocol in the H.323 Standard, and many vendors who claim H.323 compatibility do not implement it. Furthermore, most H.323 network administrators are obsessed with performance and jettison any feature that might degrade throughput. Consequently, packet encryption is rarely used and many are even hesitant to deploy the packet-integrity option.
Unlike SIP and H.323, conventional MGCP has no security primitives. Yet, it is typically deployed in environments that are the most susceptible to intruders. For instance, cable providers (via PacketCable; http://www .packetcable.com/packetcable_specs.html) have standardized on an MGCP variant called Network Call Signaling (NCS) for cable telephony. Because cable theft is rampant, a critical element of NCS are its additions to MGCP to prevent theft of service.
NCS treats all Multimedia Terminal Adapters (MTAs) located in consumer homes as untrusted entities. As a result, these devices must authenticate with a Kerberos Ticket Granting Server (TGS) before they are permitted to initiate or join multimedia conferences. This is a two-way authentication process that uses public-key certificates (PKINIT).
Once the TGS authenticates the MTA, it grants it a ticket (or session key) sealed with either the MTA's RSA public key or with a secret derived from a Diffie-Hellman exchange. This key is then used to establish an IPSEC session with a CMS via Kerberos (see Figure 4).
The beauty of this approach is that all public-key authentication operations are not real time (that is, they occur before a dial tone is needed and do not degrade CMS performance). The first time the MTA initiates communication with the CMS, the CMS validates the MTA's session key and establishes an IPSEC Security Association (SA) between itself and the MTA.
Typically, IPSEC SAs between the CMS and MTA remain active for several days. Occasionally, the CMS may decide to re-establish the IPSEC SA as a security precaution. NCS accomplishes this with the Kerberos Rekey or WakeUp message. Both of these messages perform the same function but Rekey is optimized for reestablishing a previously established IPSEC SA and requires only one round trip.
Besides creating an SA between the MTA and CMS, NCS mandates that the RTP stream between MTAs be encrypted with the RC4 algorithm. Unlike IPSEC, RC4 does not require that an SA be created before data can be transmitted. Rather, it is a lightweight technique that uses a shared end-to-end secret key to encrypt packet contents.
The end-to-end secret negotiation process begins when the CMS issues a Create Connection (CRCX) message to the first MTA. The MTA proposes a shared secret to the CMS. Under normal circumstances, the CMS relays this shared secret to the other MTA in a subsequent CRCX message (see Figure 4).
The second MTA then acknowledges the selection of the secret when it responds to the CMS. Finally, the CMS alerts the first MTA of the selected secret via a Modify Connection (or MDCX) command (see Figure 5).
Because the IETF's MGCP draft does not address security, NCS contains additional Session Description Protocols (SDP) headers (see Table 2). These headers are used to communicate the end-to-end secret needed by RC4 and ciphersuites necessary to create secure RTP/RTCP streams (ciphersuites are an enumeration of potential encryption and message authentication algorithms).
Although the designers of PacketCable's security architecture have optimized their solution for performance, cynics question whether the initial deployments will be able to meet ambitious performance goals. Specifically, they are concerned about the impact of IPSEC on packet throughput and the impact of RC4 on the generation of RTP streams. Since NCS is still in early deployment phases, it may be a while before these issues are settled.
Audio/visual streams have strict timing requirements and failure to meet these requirements results in an unpleasant user experience. Because robust security can obliterate performance, popular protocols adopt different techniques to accommodate performance issues.
SIP offers only rudimentary security features. To enable robust security, you must combine it with another standard. By contrast, H.323 defines the H.235 protocol, which explicitly defines the security features an application can support. Unfortunately, since H.235 is optional, many H.323 vendors completely ignore it. Finally, PacketCable's security architecture is the most thorough approach to multimedia security, but it is immature and still evolving.
The harsh reality is that while the industry pays lip service to security, it is dedicating its resources to improving performance. As processing power and network throughput increase, it is likely that additional security features may be incrementally deployed. However, for the foreseeable future, the costs imposed by a robust security solution are likely to outweigh the benefits.
DDJ