| oct98.tar |
Questions and Answers
Bjorn Satdeva
Unfortunately, I cannot point you to an existing structured process. However, if you look at the SAGE System Administration Job Descriptions (www.usenix.org/sage/jobs/sage-jobs.html), you will have a skills model that can be applied to your training.
tar is often a good program for archiving purposes. It is for this purpose that it originally was written (tar = tape archiver). cpio is, at least in my opinion, an anachronism, which belongs back in history with other obsolete archiving tools such as volcopy. cpio and tar suffer from the problem of accessing all files through the UNIX file system. This is slower than accessing the raw disk directly, and therefore, they have the problem of not understanding files with "holes", such as DBM files. If you back up such files with cpio or tar (except GNU tar, which has provisions to avoid this), then when you restore that file, you will get a file without the holes, which most likely will no longer fit on the disk. For backup, I prefer to use dump and restore (ufsdump & ufsrestore on System V machines, including Solaris). It has been my experience that these programs are both faster and more reliable than for backup, compared to tar and especially cpio. If you disagree, please read Elizabeth Zwicky's paper "Torture-testing Backup and Archive Programs: Things you Ought to Know But Probably Would Rather Not" from the Usenix LISA V Conference Proceedings. It might change your mind about backup programs.
There is no facility that allows the system administrator to directly specify that a job should be executed every other Saturday. You either need to get a commercial scheduler, or you can go the traditional system administrator's way and write a wrapper. If you choose to automate the backup process (including stopping and starting your database server), make sure that you have some kind of human review in the loop to catch possible problems. I have all too often seen backup processes automated so much that nobody was getting notified of possible problems. It is better to learn about problems with your backup software before you get to the point were you are unable to restore lost data.
There are, however, other issues surrounding this question. The question comes from Europe, and the legal issues there might be very different from in the United States. In any case, several countries now have email privacy laws, and I am not sure how deleting email from users mailboxes would be interpreted under such a law. Thus, this is an area where it would be wise to tread very carefully. It might simply be better to disallow new mail delivery if the size of the system mailbox file grows bigger than a certain size. From a system administration perspective, you can probably safely move a large mailbox file to the user's home directory, and then send them an email notifying them of the move and telling them how to access the old mail files.
If these books do not meet your needs, you could try finding people who have done UNIX system administration for a long time and talking with them. The problem is, however, in the past system administration was considered something you did until "you grew up and became a real programmer". Back in 1988, when I started /sys/admin as a UNIX system administration consulting business, most of my friends thought I was committing professional suicide. However, there were even then a few people who found system administration interesting. You will find the names of many of these people in early LISA conference proceedings.
The big advantage of NIS is that it uses the standard UNIX configuration files (/etc/hosts, /etc/passwd, etc.). It is much simpler to write tools that interact with those text files, compared to the data files from NIS+.
Security is very important, but we should never lose sight of the simpler solutions. System security problems are certainly a much bigger problem than in the past, but the majority of the problems we encounter are still caused by simple mistakes.
vmstat is, first of all, a tool to monitor memory usage. The output format differs between the various versions of UNIX, but the following should be true on almost every type of UNIX. The most important column is the one marker "po" (page out). This tells you the number of memory pages that have been paged out to the disk. It is okay if the system is doing a little bit of paging now and then, but if it gets to be too much, performance will suffer. Another interesting column is "procs", the number of processes in various states. "r" is the number of processes in the run queue (i.e., waiting for a CPU slice). "b" is the number of processes waiting (typically on I/O), and "w" is waiting (swapped out) processes. The latter should nearly always be zero on a system that has enough memory. vmstat also gives some CPU statistics. "sy" is the percentage of the CPU time used for system processes, and "us" is the same for processes in user space. The split between those two values depends on the type of load on the system, but is typically a little over twice the time used in user space compared to system space. "id" is the CPU idle time. If it gets to zero, it means you are using all the available CPU cycles. This might mean that you need a faster CPU. Check the number of waiting processes to see how long the process queue is. iostat gives some of the same information, but is otherwise used to monitor the disk access. You can use this program to see if you have a balanced load between the disks. An even disk load is almost impossible to achieve, but aim for a reasonable distribution. If you have many disks, but all the traffic is going to only one or two, you might consider changing the way you have your file systems allocated. Both of the above commands will, when called without arguments, attempt to provide data with some kind of average since the system were booted last time. This information is almost always useless, and may not be traceable at all. So when you use these commands, keep them running and watch the output for problems. A good interval is normally around 5 seconds. netstat is very different from the two above. Find the option for your version of UNIX that shows you the number of collisions and errors on the network (this is -i on many systems). The collisions should be an order of magnitude smaller than the number of packets transmitted, and the number of errors should be very, very low. About the Author
Bjorn Satdeva is the president of /sys/admin, inc., a consulting firm which specializes in large installation system administration. Bjorn is also co-founder and former president of Bay-LISA, a San Francisco Bay Area user's group for system administrators of large sites. Bjorn can be reached at questions@sysadmin.com.
|