Congestion Control in Frame-Relay Networks

LAN-to-LAN data-transmission strategies

William Stallings

William is an independent consultant and president of Comp-Comm Consulting of Brewster, MA. He is the author of over a dozen books on data communications and computer networking, his most recent being ISDN and Broadband ISDN, with Frame Relay and ATM, third edition (Prentice-Hall, 1994). William can be reached at stallings@acm.org.

Frame relay is a standardized, public, packet-switched, data-network service that functions as a public wide area network (WAN) backbone connecting individual local area networks (LANs). Data is transmitted in packets from one LAN to another through a frame-relay network over high-speed leased lines. Because of its design, the Frame Relay ANSI T1.606 Standard allows for an easy migration path from present to future network architectures, as existing systems (including T1, ISDN, and others) can be upgraded, and hence preserved, via software.

Additionally, frame relay can manage bursty, unpredictable data traffic and provide single-line access to the network with logical connections to other destinations. The end result is minimal hardware, a simple network design, and reduced operating costs.

With all this in mind, it's no surprise that frame-relay bearer services--that is, those services designed to serve both LAN interconnections and existing host-computer environments--are becoming increasingly popular. CompuServe's Frame-Net frame-relay service, for example, enables dedicated Internet access at up to T1 speeds (1.536 Mbits/sec), and 14.4 Kbits/sec dial-up access via point-to-point protocol (PPP).

Still, the frame-relay standard does not specify a mechanism for flow control and error control between users and the network. The data-link control protocol for frame relay (LAPF) uses a frame structure that does not contain a control field and therefore has no sequence numbers to work with. While this streamlined protocol provides for efficient data transfer, it lays the network open to the possibility of congestion. To deal with this problem, standards organizations have proposed a variety of congestion-control techniques. Because of the large number of techniques that can be used alternatively or in conjunction with one another, and because the specifications are scattered through various documents in no particular order, this is the most confusing aspect of frame relay.

In this article, I'll provide an overview of frame-relay congestion control, looking first at the explicit congestion-control techniques proposed for frame-relay bearer services, and then present a set of algorithms that split the responsibility for congestion control between the network and the subscriber.

Congestion in Frame-Relay Networks

A frame-relay network is a form of packet-switching network in which the "packets" are layer-two frames. As in any packet-switching network, one of the key areas in the design of a frame-relay network is congestion control. In essence, a frame-relay network is a network of queues. At each frame handler, there is a queue of frames for each outgoing link. If the rate at which frames arrive and queue up exceeds the rate at which frames can be transmitted, the queue size grows without bounds and the delay experienced by a frame goes to infinity. Even if the frame-arrival rate is less than the frame-transmission rate, queue length will grow dramatically as the arrival rate approaches the transmission rate. As a rule of thumb, when the line for which frames are queuing exceeds 80 percent utilization, the queue-length growth rate becomes a problem.

Figure 1 shows the effect of congestion in general terms. Figure 1(a) plots the throughput of a network (number of frames delivered to a destination station per unit time) versus the offered load (number of frames transmitted by all subscribers), while Figure 1(b) plots the average delay from entry to exit across the network. At light loads, throughput and network utilization increase as the offered load increases. As the load continues to increase, a point is reached (point A in the plot) beyond which the throughput of the network increases at a rate slower than that of the offered load. This is due to network entry into a mild-congestion state. At this level, the network continues to cope with the load, although with increased delays.

As the load on the network increases, a point is eventually reached (point B) beyond which throughput drops with increased offered load. The reason for this is that the buffers at each node are of finite size. When the buffers at a frame handler become full, it must discard frames. Thus, the sources must retransmit the discarded frames in addition to new frames. This only exacerbates the situation: As more and more frames are retransmitted, the load on the system grows, and more buffers become saturated. While the system is trying desperately to clear the backlog, users are pumping old and new frames into the system. Even successfully delivered frames may be retransmitted because it took too long at a higher layer (for example, the transport layer) to acknowledge them: The sender assumes the frame did not get through. Under these circumstances, the effective capacity of the system is virtually zero.

Avoiding these catastrophic events is the task of congestion control. The object of all congestion-control techniques is to limit queue lengths at the nodes so as to avoid throughput collapse.

Congestion-Control Strategies

Congestion control is the joint responsibility of the network and end users. The network (that is, the collection of frame handlers) is in the best position to monitor the degree of congestion, while the end users are in the best position to control congestion by limiting the flow of traffic.

Table 1 lists the congestion-control techniques defined in the various standards documents. Discard strategy deals with the most fundamental response to congestion: When congestion becomes severe enough, the network is forced to discard frames. You would like to do this in a way that is fair to all users.

Congestion-avoidance procedures are used at the onset of congestion to minimize the effect on the network; thus, these procedures would be initiated at or prior to point A in Figure 1, to prevent congestion from progressing to point B. Near point A, there would be little evidence available to end users that congestion is increasing, so there must be some explicit signaling mechanism from the network that will trigger the congestion avoidance.

Congestion-recovery procedures are used to prevent network collapse in the face of severe congestion. These procedures are typically initiated when the network has begun to drop frames due to congestion. Such dropped frames are reported by some higher layer of software and serve as an implicit signaling mechanism. Congestion-recovery procedures operate around point B and within the region of severe congestion, as shown in Figure 1.

Congestion avoidance with explicit signaling and congestion recovery with implicit signaling are complementary forms of congestion control in the frame-relaying service.

The details of the algorithm depend on whether the end system has actual control of the information rate from the source system or uses some sort of sliding-window, flow-control scheme. A rate-based system can provide a more precise control of information flow since it is based on the actual information rate in bits per second. Since frame relay does not require the use of fixed-size frames, a window-based system can provide only an approximate control over information rate. Such control is reasonably precise only if the statistical variance of the frame size is small.

Rate-based control. For rate-based control, it is assumed that a destination system has a means of regulating the data rate at the source-end system. Let us refer to current data rate as R. On each connection, the end system maintains two counters: FECN0 is the number of LAPF frames with FECN=0, and FECN1 is the number of LAPF frames with FECN=1. These counts are accumulated over a measurement interval delta ⁱ_j. The standards suggest a value of delta ⁱ_j approximately equal to four times the end-to-end transit delay.

The algorithm is as follows: Initially, set R=CIR or less in the receive direction. This "slow start" is intended to avoid an impulse load on the network when the user begins transmitting. Then, at the beginning of each measurement interval, set FECN0=FECN1=0. At the end of each measurement interval, if FECN1FECN0, set R=0.875 x R; if FECN1<FECN0, set R=1.0625 x R.

If a connection has been idle for a long time, then R should be set to CIR for that connection.

Note that when congestion is detected, the rate reduction is by a factor of 1/8, whereas the recovery is by a factor of 1/16. This slower recovery strategy is intended to avoid oscillations between congested and noncongested states.

Window-based control. Assume that sliding-window flow control is used and that the destination system can adjust the receive window size W between 1 and some maximum value W_max. Again the counters FECN0 and FECN1 are accumulated over a measurement interval delta ⁱ_j. If the current window size is W, then delta ⁱ_jis defined to be twice the interval during which W frames are transmitted and acknowledged (two window turns).

The algorithm is: Initially, set W=1. Again, this provides a slow start. Then, at the beginning of each measurement interval, set FECN0=FECN1=0. At the end of each measurement interval, If FECN1FECN0, set W= half brace MAX[0.875xW,1]; if FECN1<FECN0, set W=MIN[W+1,Wmax].

If a connection has been idle for a long time, W should be set to 1 for that connection.

Backward Explicit Congestion Notification

Backward explicit congestion notification can be achieved with either the BECN bit in the LAPF address field or a consolidated link layer management (CLLM) message carried in a LAPF frame.

The BECN bit is set to notify the receiving system that the frames it transmits on this connection may encounter congestion. In response to this, the receiving system should reduce the flow of data transmitted on that connection.

The receiving-end system should use the following strategy for each connection:

When the first frame with the BECN bit set is received, reduce the information rate to CIR.

If additional consecutive frames with the BECN bit set are received, then institute additional reductions.

If a consecutive sequence of frames with the BECN bit set to zero are received, then gradually increase the flow of frames.

This strategy reacts rapidly to congestion notifications because, the end system immediately reacts to a single BECN bit and reduces its outgoing flow rather than signaling its peers to reduce the incoming flow. This reflects a belief that congestion occurs quickly.

As with the response to forward-explicit congestion notification, the details of the algorithm depend on whether control is rate based or window based.

Rate-based control. The standards define a step count S that is used to determine when the transmitter may increase or decrease its rate. The algorithm is as follows: Initially, set IR=CIR or less in the transmit direction. Then:

If a frame with BECN set to 1 is received, and the user's offered rate, R, is greater than CIR, then reduce the offered rate to CIR.
If S consecutive frames are subsequently received with the BECN bit set to 1, the user should reduce its rate to the next lower "step." Further rate reductions should not occur until an additional S consecutive frames are received with the BECN bit set. The step rates are R=0.675xCIR, R=0.5xCIR, and R=0.25 xCIR.
After the user has reduced its rate due to receipt of BECN signals, it may increase its rate by a factor of 0.125 after any S/2 consecutive frames are received with the BECN bit clear; that is, R=1.125 xR.

If a connection has been idle for a long time, then R should be set to CIR for that connection.

Window-based control. Here, the step count S is defined to be the interval during which one frame is transmitted and acknowledged. The algorithm is as follows: Initially, set the window size W to some small value such as 1 or 0.5xlast window size. Then:

If a frame with BECN set to 1 is received, then set W=MAX[0.625xW,1].
If S consecutive frames are subsequently received with the BECN bit set to 1, the user should repeat the reduction.
After the user has reduced its window size due to receipt of BECN signals, it may increase its window size by one after any S/2 consecutive frames are received with the BECN bit clear; that is, W=MIN[W+1,W_max].

If a connection has been idle for a long time, then W should be set to its initial value for that connection.

Consolidated Link Layer Management (CLLM). CLLM is a variation of backward explicit congestion notification that uses a message rather than the BECN bit to signal congestion. The CLLM technique can be used when congestion occurs at a network node, but no reverse traffic is available to carry the BECN indication. CLLM messages carry a list of congested DLCIs to reduce the traffic load on the network.

The CLLM is a message carried in the information XID LAPF frame. The body of the XID frame lists the frame-relay connections that are congested and identifies the cause. The following cause values are recognized:

Network congestion due to excessive traffic.
Facility or equipment failure.
Maintenance action.
Unknown.

If these conditions are short term, they are anticipated to have a duration of seconds or minutes; otherwise the designation is long term.

Implicit Congestion Control

Implicit signaling occurs when the network discards a frame, and this fact is detected by the end user at a higher, end-to-end layer. When this occurs, the end-user software may deduce that congestion exists.

For example, in a data-link control protocol that uses a sliding window flow and error-control technique, the protocol detects the loss of an information frame in one of two ways. When a frame is dropped by the network, the following frame will generate an REJ frame from the receiving end point. When a frame is dropped by the network, no acknowledgment is returned from the other end. Eventually, the source end will time out and transmit a command with the P bit set to 1. The subsequent response with the F bit set to 1 should indicate that the receive sequence number N(R) from the other side is less than the current send sequence number.

Once congestion is detected, the protocol uses flow control to recover from the congestion. Assume that the layer-two window size, W, can vary between the parameters W_min and W_max, and is initially set to W_max. We would like to reduce W as congestion increases to gradually throttle the transmission of frames. Example 1 lists three classes of adaptive window schemes based on response to one of the two aforementioned conditions.

Successful transmissions (measured by receipt of acknowledgments) may indicate that the congestion has decreased and window size should be increased. Example 2 shows two possible approaches. Studies suggest that the use of the strategy in Example 1(c) with =0.5 plus the strategy in Example 2(b) provides good performance over a wide range of network parameters and traffic patterns. This is the strategy recommended in the standards.

Figure 1 Effects of congestion: (a) throughput; (b) delay.

Table 1: Frame-relay congestion-control techniques.

Type        Technique          Function           Key Elements       
Discard     Discard            Provides guidance  DE bit
strategy    control            to network
                               concerning which
                               frames to discard.

Congestion  Backward explicit  Provides guidance  BECN bit or
avoidance   congestion         to end systems     CLLM message
            notification       about congestion
                               in network.

Congestion  Forward explicit   Provides guidance  FECN bit
avoidance   congestion         to end systems
            notification       about congestion
                               in network.

Congestion  Implicit           End system infers  Sequence numbers
recovery    congestion         congestion from    in higher-layer
            notification       frame loss.        PDU

Figure 2 Queue-length averaging algorithm. t=current time; ti=time of ith arrival or departure event; qi=number of frames in the system after the event; T0=time at the beginning of the previous cycle; and T1=time at the beginning of the current cycle.

Example 1: Three classes of adaptive window schemes.

(a)  Set W=Max [W--1,W_min]

(b)  Set W=W_min

(c)  Set W=Max [aW,W_min], where 0  < a < 1

Example 2: Successful transmissions may indicate that the congestion has decreased and window size should be increased.

(a) Set W=Min[W+1,W_max] after N consecutive successful transmissions.

(b) Set W=Min[W+1,W_max] after W consecutive successful transmissions.