Article

sep2006.tar

Control your Network Destiny: Why QoS, TS, and Other Acronyms Matter

Greg Bledsoe

Did you know Linux has traffic control capabilities that rival any commercial routing or quality of service platform? I didn't until I needed to better control how my home/business bandwidth was being used, and now I find this capability extremely valuable. In this article, I'll cover some basic network concepts, describe the elements of network traffic control, and provide some examples of how to use built-in Linux tools to allow advanced control and ensure different qualities of service (QoS) for different classes of traffic.

To get started, let's consider a hypothetical home/business network. Suppose you have a home network that is fairly typical of a tech-savvy family in the digital age: broadband Internet connection, several PCs, teenagers using chat and email and playing games online, and maybe an http server, too. Or maybe it's a server to host a network-capable game like FreeCiv. And, to top it off, let's say you use Vonage or Skype or another VOIP provider for all your voice lines over the same broadband connection. Did I mention that your wife telecommutes? Okay, not so hypothetical -- this is what my home and business network looks like. We have seven computers in the house with two servers accepting connections and providing services. They all share one broadband connection for a wide variety of purposes. As you might guess, our connection to the larger world almost never hits 0% and quite often pegs out either up or downstream at 100%.

Now, let's say one afternoon you are in the middle of a phone call with a potential client over VOIP, when suddenly the service quality is reduced to an unusable state. While tracking down the cause, you find your teenaged daughter in the other room emailing an episode of her favorite TV show to a friend and wondering in wide-eyed innocence why it's taking so long. To understand why uploading over a broadband connection can kill VOIP quality, make interactive connections useless, break VPNs, and eventually destroy download speeds, you have to understand some basic network concepts. Armed with that knowledge, you can take advantage of tools that are handily built in to the Linux kernel and ensure that different traffic on your network receives different qualities of service.

For purposes of this article, I'll assume that you understand basic TCP/IP protocol function and the difference between stateless protocols like UDP and connection-oriented ones like TCP. There are many good resources on the Web if you need more information.

Packet Queuing

The basic problem described above lies in queuing. In network parlance, it means packets are backing up and having to wait in some device before they can be sent out over the wire. Broadband ISPs typically provide you a lot more inbound bandwidth than outbound, because most casual home users are data consumers and not providers, pulling more traffic in than they send out. Packet loss in either direction can hurt that all-important uni-directional speed, so it is fairly common for cable modems and other broadband termination devices to have very long queues, often configured with enough buffer space to queue several seconds of traffic leaving your network and heading out to the Internet. This allows for an overall speed-up for typical consumers, but is a very bad thing for interactive or time-sensitive traffic like VPNs or VOIP.

Let's look at the case of the jumpy VOIP. Your ear can detect auditory latency over 250 milliseconds or so, and it's very sensitive to sounds that are jumbled out of order. VOIP technology uses stateless UDP packets and throws away whatever takes longer than about two-tenths of a second to arrive (excessive latency) and also discards packets that arrive out of order (jitter). To work well, VOIP needs packets to arrive in the order they were sent, without much delay, and in a pretty steady stream with little variation in time between packets. It is fairly obvious, then, why buffering even as little as half a second's worth of data in your cable modem, because you are exceeding your 128-kbps upload speed, makes your Vonage connection unusable. But why should that kill downloads? Isn't that buffering specifically to speed those up?

While downloading from the Internet over a TCP connection is much less sensitive to latency and is, in fact, helped by occasional buffering to prevent loss from bursty traffic, in the case where there is a steady stream of traffic exceeding upstream bandwidth for a longer period of time than the broadband device can queue, TCP falls apart. The device from which you are downloading sends an amount of data defined by a parameter, called its "send window," which defines the number of bytes it can transmit before it has to stop and wait for your end to acknowledge what it just sent.

Modern TCP algorithms are pretty good at self-adjusting to network conditions, but sustained latency is very difficult to adjust to unless the device's TCP software is preconfigured for it, for example, when there is a connection that must always traverse a high-latency path, like a satellite feed. In this case, incoming packets are getting to you just fine, you have headroom in the downstream direction, but when you send your acknowledgment back it gets queued behind two seconds or more of data already waiting to go out. Or maybe the queue is full, and your ack packet gets dropped into the bit bucket. Maybe the queue stays full, and it winds up taking an average of three or four re-transmissions to get your acknowledgment through, which translates into 6 to 10 seconds before the other end knows you got what it sent and can resend. Depending on the timeout value configured for the device, your TCP connection can break altogether and need to be restarted.

Basic Traffic Shaping

If only there were a way to prevent, or at least control, this queuing. Well, there is, and for simple setups, you don't even have to configure anything complicated. Most broadband routers allow you to shape your upstream traffic, even going so far as to give you the ability to prioritize certain source or destination ports above other traffic. If you haven't turned on basic traffic shaping in your linksys or d-link router, I suggest you do so now. It will greatly improve your network performance overall if you set the router to send a percent or two less than your ISP's allowed bandwidth. This will prevent any queuing at all in the DSL or cable modem, greatly smoothing out bumps in the road.

If you run your own Linux box as a firewall and/or gateway device in front of your broadband connection, there is a handy dandy little script called WonderShaper that will do the trick for you, and it's available in most distributions' packaging systems. Just apt-get, yum, or urpmi WonderShaper, configure /etc/wshaper.cfg, and you can specify upstream bandwidth and prioritize certain traffic above the rest.

In our typical broadband case where downstream bandwidth is double or more what is available upstream, we need to tightly control our outbound bandwidth use. Fortunately, this is what QoS and TS are for. We have the most control over what we send out. Controlling inbound bandwidth is like contacting everyone that could possibly send you mail through the post office and asking them all to send only so many packages per week. We do have a technique called "ingress policing" at our disposal, which takes advantage of the way the TCP protocol works to simulate a slower link for certain types of traffic, making the TCP algorithm back off the rate of traffic and thereby controlling inbound bandwidth to a degree. I won't be able to cover that here, but the documentation can be found in the References section.

Befitting the power and flexibility of the solutions available to us, there are many ways to configure a working solution that does what we want with our outbound traffic. The absolute simplest configuration we could use to prioritize our traffic would be to honor the type of service (TOS) field in the IP header. However, this is problematic for several reasons, not the least of which is that it can be manipulated by user space programs, and any ill-behaved application can tell your network to give it highest priority. In other words, it can't be trusted. So we throw out that idea first thing. Of the solutions we might actually choose, the oldest and most complex one would involve using class-based queuing (CBQ) in combination with token bucket filters (TBF) and a fairness queuing algorithm like stochastic fairness queuing (SFQ). Now let's look at these tools more closely.

SFQ

SFQ prevents any particular "flow" of data from dominating a link. Sometimes an aggressively configured TCP stack can send lots of data at one time and "crowd out" more sensibly configured devices. If your link is heavily utilized, this has an adverse affect on the network. The SFQ algorithm identifies particular conversations by address:port pairs and gives each flow round-robin access to send, so it is an almost perfectly fair way to share bandwidth. On the other hand, if your link isn't close to full, it doesn't do anything.

TBF

TBF is a very precise way to control the rate at which a particular type of traffic is sent. If all you need to do is slow down an interface, this is the way to do it. It works in a very common-sense way: think of the queue as a bucket. Tokens are dropped into the bucket at a predetermined rate. Each token is like a hall pass, a permission slip to send a packet. When you aren't sending, you can accumulate tokens up to a certain number, meaning after a period of inactivity, you can send faster while you have tokens left, but once you've used them all, you have to wait until another token drops into the bucket. This gives you the ability to burst for a short period while ensuring your average use doesn't greatly exceed the desired rate and cause problems.

CBQ

Then we come to the venerable CBQ. CBQ sets up "classes" of traffic based on the filters you specify. It has lots of knobs, most of which are only marginally documented, and a slew of parameters that have to be specified just to make it work. Because of that, even a very simple configuration immediately becomes complex with CBQ. It has the added disadvantage that its shaping algorithm quite often miscalculates and is resultantly unpredictable and inaccurate. Fortunately, this is not the only classful queuing discipline (qdisc) available to us.

HTB

Hierarchical token bucket (HTB) is another classful qdisc specifically designed to replace CBQ. It is simpler, far more accurate in its shaping, has fewer parameters and better documentation, and as of version 3, performs equally well under load. If you are running a modern kernel (late 2.4.x or 2.6) then this is probably the way you want to go. But enough conceptual stuff. So now let's code.

Err, just one problem. We have to define what we want to do and figure out what our classes will look like before we can figure out what our configuration script will need to be. We also have to understand that each interface has one egress "root qdisc." By default, this is just a plain first-in-first-out (FIFO) qdisc until you make it something else. Each qdisc and class is assigned a handle, which can be used by later configuration statements to refer to it. Handles consist of two parts, a major number and a minor number, like <major>: <minor>. It is customary to name the root qdisc '1:', which is the same as '1:0'. Think of it like writing one point zero. You don't need the point zero unless it's something other than zero. The minor number of a qdisc is always 0. Classes are created under qdiscs and traffic assigned to them with filters. The classes under a qdisc must have the same major number and a different minor number, like 1: 1 and 1:100. A class can then contain another qdisc, which has a different major number than its parent (e.g., 10: or 100:).

Configuration

Now let's finally decide exactly what we want to do. For our network configuration, we want to prioritize VOIP traffic over everything, next we want VPN traffic to have a good portion of the bandwidth, then we want to guarantee our HTTP traffic some bandwidth, followed by everything else. So, what we want is this:

       1:0                  root qdisc (htb) 
        | 
       1:1    -----1:4 
     /  |  \        |  
    /   |   \       |   
   /    |    \      |
1:10  1:20  1:30   1:40    classes (htb) 
  |     |     |     | 
 10:   20:   30:    40:    qdiscs 
VOIP   VPN   HTTP   ALL 
fifo  fifo   sfq    sfq

At this point, we can start configuring. You can link a script at the end of /etc/rc.local or get really fancy and put one in /etc/init.d/. To configure traffic shaping, we use the "tc" command as root (try "man tc" in the console) for all configuration:

# tc qdisc add dev eth0 root handle 1: htb default 40

This creates our root qdisc and tells it that any otherwise unclassified traffic gets assigned to class 1:40. Next, we create our classes:

# tc class add dev eth0 parent 1: classid 1:1 htb rate 120kbps \
  ceil 120kbps

If we have 128 kilobyte/sec upload (as is fairly common), 120 would be a good rate to use to allow for potential cell overhead, etc. The "rate" parameter is what this particular class is guaranteed to have, while "ceil" is what it can never go beyond. To ensure overall shaping, we could stop right here.

# tc class add dev eth0 parent 1:1 classid 1:10 htb rate 20kbps \
  ceil 120kbps

To our first class, we just added an htb qdisc and guaranteed it 20 kbytes/sec, while allowing it to "borrow" as much as it might need to burst.

# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 50kbps \
  ceil 100kbps

We gave this class a guaranteed rate of 50 kbytes/sec, and it can also borrow up to 100 kbytes/sec, but will always leave at least 20 kbytes/sec.

# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 50kbps \
  ceil 100kbps

The class intended for http traffic now has 50 kbytes/sec forever its own and, like the previous class, can borrow up to 100 kbytes/sec.

# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbps \
  ceil 100kbps

We gave our default class (which will catch that email attachment that gave us so much trouble before) no real guarantee, but the ability to borrow when no one else needs to send data. If all classes are operating above their "rate", they will share the bandwidth under 100 with a 50:50:1 ratio as a function of their configured rate. This has a lot of implications to consider in more complex scenarios.

Note that there will always be 20 kbytes per second available to VOIP above and beyond anything else, but right now we do nothing to stop runaway connections within classes as all classes have the default fifo qdisc. This is what we want for our VOIP and VPN classes, because they will only carry one connection under most circumstances. For the others, which might carry any number of conversations at any given time, we need to do a little something more.

# tc qdisc add dev eth0 parent 1:30 handle 30: sfq 
    perturb 10

We just replaced the default fifo qdisc under 1:30 with an sfq. The "perturb" parameter specifies how often to change hashing algorithms to protect against hash collisions that might deprive a flow of its rightful bandwidth. We've configured it to change every 10 seconds.

# tc qdisc add dev eth0 parent 1:40 handle 40: sfq 
    perturb 10

And we just did the same thing with our catchall class, replacing the fifo qdisc. Now all flows will share equally in whatever bandwidth this class has at any given moment. But we still haven't configured our filters to actually assign traffic.

So, let's say our VOIP is configured for sip on udp port 5060 and rtp on udp port 10000. This filter will send it to class 1:10:

# TC='tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32' 
# $TC match ip dport 5060 0xffff match ip dport 10000 0xffff \
  flowid 1:10

Our VPN connects to port 1072:

# $TC match ip dport 1072 0xffff flowid 1:20

And as we designed, http and https will go to the third class:

# $TC match ip sport 80 0xffff match ip sport 443 0xffff flowid 1:30

We don't have to configure a filter for 1:40 because it is the default. It is our "catchall" class.

To look at what we have configured and get statistics, we can use tc -s qdisc ls dev eth0.

Conclusion

I've shown a very simple example of what is possible using Linux's advanced traffic control features. Classes and qdiscs can be combined and recombined with infinite variety. This short article simply doesn't allow space to examine the subject in more depth, but you should now have the knowledge to begin configuring simple traffic control on your own networks and understand the concepts well enough to better understand the implications of design choices. To study the subject more in depth, check out the documents in the References section then go forth and multiply your classes and qdiscs!

References

HTB homepage -- http://luxik.cdi.cz/~devik/qos/htb/

Linux Advanced Routing & Traffic Control -- http://lartc.org/

Linux: Advanced Networking Overview -- http://qos.ittc.ku.edu/howto/index.html

Linux DiffServ homepage -- http://www.opalsoft.net/qos/DS.htm

Linux Traffic Control HOWTO -- http://linux-ip.net/articles/Traffic-Control-HOWTO/index.html

Greg Bledsoe has spent his career designing, implementing, and troubleshooting networks, tackling whatever technical problems have struck his fancy, and coding up interesting bits now and then. Now he has struck out on his own and writes articles to supplement his income. He and his lovely wife have five home-schooled children, so he has no spare time for hobbies but imagines if he did he'd get in better shape. Greg can be reached at: QoS.article@gmail.com.