| oct2000.tar |
Questions and Answers
Jim McKinstry
procs memory page faults cpu r b w avm free re at pi po fr de sr in sy cs us sy id 3 0 0 9349 1100 0 0 0 0 0 0 0 463 4507 937 94 6 0 2 7 0 9500 1046 0 0 0 0 0 0 0 515 4403 1112 92 8 0 2 2 0 9444 1213 0 0 0 0 0 0 0 447 4280 918 92 8 0Notice that the r column is greater than 1. In general, this means that you may have a CPU bottleneck. A look at sar -u and sar -q output also reveals a possible CPU bottleneck. Notice the runq-sz (average length of run queue) and %runocc (% of time CPU was busy) fields in this sar -q output:
07:00:01 runq-sz %runocc swpq-sz %swpocc 07:01:01 3.0 100 0.0 0 07:02:01 3.0 100 0.0 0 07:03:01 3.2 100 0.0 0 07:04:01 3.5 100 0.0 0 07:05:01 3.6 100 0.0 0 07:06:01 3.5 100 0.0 0Compared to the same report when the system is not under load:
10:00:00 runq-sz %runocc swpq-sz %swpocc 10:01:00 1.3 48 0.0 0 10:02:00 1.7 40 0.0 0 10:03:00 1.8 42 0.0 0Notice the %idle (% of time CPU was idle) column in this sar -u output:
07:00:01 %usr %sys %wio %idle 08:45:00 98 2 0 0 08:46:00 98 2 0 0 08:47:00 98 2 0 0Compared to the same report when the system is not under load:
10:00:00 %usr %sys %wio %idle 10:01:00 14 1 4 81 10:02:00 12 1 1 86 10:03:00 15 1 4 80 10:04:00 11 1 2 86 10:05:00 11 0 1 87 10:06:00 12 1 1 86To check for memory problems, start with sar -w command. In this sar -w output, we see that the swpot/s column is consistently a 3. This value should not rise above 0:
10:50:06 swpin/s bswin/s swpot/s bswot/s pswch/s 10:50:11 1.00 0.0 3.00 0.0 859 10:50:16 1.00 0.0 3.00 0.0 631 10:50:21 1.00 0.0 3.00 0.0 665Now use vmstat to check out your paging activity. This vmstat report shows some significant paging (the po column) activity. It also shows that the free memory pool is very low (the free column):
procs memory page faults cpu r b w avm free re at pi po fr de sr in sy cs us sy id 3 0 0 7743 886 0 0 0 6 0 0 185 359 12460 833 78 22 0 0 2 0 7543 900 0 0 0 6 1 0 196 510 6857 955 81 17 1To check on the I/O of the system, start with vmstat. The following vmstat output shows a 5-minute period of disk I/O contention. On a healthy system, the b column (processes blocked waiting for I/O) would be 0:
procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
2 7 0 9173 2668 7 2 0 0 0 0 0 485 6657 923 89 9 2
2 6 0 9187 2746 6 1 0 0 0 0 0 431 4067 703 84 6 10
3 4 0 8773 2696 6 1 0 0 0 0 0 439 4307 800 92 8 0
1 6 0 8510 3943 8 4 0 0 0 0 0 340 3835 634 91 8 1
sar -u shows similar results. The %wio column shows the percentage of time that the system is waiting for I/O to complete. This number should be 0:
07:00:01 %usr %sys %wio %idle 07:01:01 67 10 17 6 07:02:01 66 8 27 0 07:03:01 83 6 11 0Now use sar -d to check which disk(s) are causing the problem. The sar -d sample, below, shows an extreme example of a problem with two of the disks (c0t15d0 and c0t11d0). The important fields to consider are avque (average number of requests outstanding for the device), avwait (average time in milliseconds that transfer requests waited idly on queue for the device) and avserv (average time in milliseconds to service each transfer request for the device). In this sample, each of these numbers is extremely elevated: device %busy avque r+w/s blks/s avwait avserv c0t5d0 0.53 0.71 0 6 6.81 15.88 c0t15d0 8.45 1127.01 40 634 2261.85 27.71 c0t11d0 20.32 889.79 111 1776 1624.10 24.46 c0t9d0 2.93 0.92 3 47 8.28 12.78 c0t12d0 1.42 0.50 1 18 5.34 12.37Here's a little script that I've used to analyze network traffic:
OLD_PACKETS_IN=0 OLD_PACKETS_OUT=0 OLD_PACKETS_IN_ERR=0 OLD_PACKETS_OUT_ERR=0 OLD_COLLISIONS=0 echo The first entry in this file is the statistics accumulated since last boot. echo The other entries are the statistics for the previous minute: while [ 1=1 ] # Run forever do set 'netstat -i | grep lan0' # get the nestat -i stats for lan0 NEW_PACKETS_IN='expr "$5" - "$OLD_PACKETS_IN"' OLD_PACKETS_IN="$5" NEW_PACKETS_OUT='expr "$7" - "$OLD_PACKETS_OUT"' OLD_PACKETS_OUT="$7" NEW_PACKETS_IN_ERR='expr "$6" - "$OLD_PACKETS_IN_ERR"' OLD_PACKETS_IN_ERR="$6" NEW_PACKETS_OUT_ERR='expr "$8" - "$OLD_PACKETS_OUT_ERR"' OLD_PACKETS_OUT_ERR="$8" NEW_COLLISIONS='expr "$9" - "$OLD_COLLISIONS"' OLD_COLLISIONS="$9" date echo Packets in: $NEW_PACKETS_IN echo Packets in Errors: $NEW_PACKETS_IN_ERR echo Packets out: $NEW_PACKETS_OUT echo Packets out Errors: $NEW_PACKETS_OUT_ERR echo Collisions: $NEW_COLLISIONS echo sleep 60 HOUR='date +%H' if [ $HOUR -eq "18" ] then exit fi done exitIt's crude, but it works. Run this from cron every day during peak hours (i.e., I would start this at 7:00 AM Monday-Friday. The script kills itself at 6:00 PM). Some sample output follows (you can ignore the first entry): The first entry in this file is the statistics accumulated since last boot. The other entries are the statistics for the previous minute:
Wed Mar 3 16:18:53 EST 1999 Packets in: 253764169 Packets in Errors: 22 Packets out: 28146014 Packets out Errors: 264 Collisions: 1182025 Wed Mar 3 16:19:54 EST 1999 Packets in: 3411 Packets in Errors: 0 Packets out: 82 Packets out Errors: 0 Collisions: 3 Wed Mar 3 16:20:54 EST 1999 Packets in: 3097 Packets in Errors: 0 Packets out: 47 Packets out Errors: 0 Collisions: 0This output shows no problems. If you were to see either error field or the collisions approach 10% of the total in/out packets, then there is probably a network issue. There are other tools out there (top, glance, etc.) that you can use as well.
Now you can use the swap command (some systems use the swapon command) to add the swap file to the swap pool. For example, swap -a /test-swap/SWAP_FILE. Some systems support the -d flag for the swap command. The -d flag removes the file from the swap pool on the fly. For the most part, you probably just want to add the file and leave it. Your system won't use it if it doesn't need it. The best solution is to add more RAM and avoid swapping altogether.
Here are the different layers of the Fibre Channel protocol:
FC-0: Physical Interface and Media. Defines the physical characteristics of the interface and media including the cables (copper, glass), connectors (GBICS), and drivers. FC-1: Transmission Protocol. Defines the transmission protocol including serial encoding/decoding rules, special characters, and error control. FC-2: Framing and Signaling Protocol. Defines the rules for the signaling protocols and describes transfer of the data frame, sequence, and exchanges. FC-3: Common Services. Defines functions/services that span multiple ports to provide advanced features. Functions that are currently defined include: Hunt groups -- A hunt group is a set of associated ports attached to a single node. Striping -- Use multiple ports in parallel to transmit a single piece of information across multiple links. Multicast -- Multicast delivers a single transmission to multiple destination end ports in point to point links. FC-4: Upper Layer Protocol Mapping -- Defines the application interfaces (audio/video, real-time computing, etc.) that can execute over Fibre Channel.
Check out www.fibrechannel.com for more information on the technology.
About the Author
Jim McKinstry is a Senior Sales Engineer for MTI Technology Corporation (www.mti.com). MTI is a leading international provider of data storage management products and services. He can be reached at: jrmckins@yahoo.com.
|