Article oct2006.tar

Multi-Platform Image Backups with Bootable, Open Source Distros

Bill Pierce, Jon Pomeroy, and Alan Lavitt

You've just made a somewhat touchy configuration change to a black box in your operating system (OS) when it happens -- the system fails to boot. Without image backups, you may face hours of troubleshooting or re-installation. With image backups, however, five minutes of commands and some time for the backup image to transfer will put the OS right back where it was. Image backups provide a safety net along with the freedom to experiment. We run a multi-platform test lab where it is important to install software on top of a fresh system that is in a known, clean state. A system for image backups allows us to bring systems back to that state in a snap.

The Image Backup Process

An image backup is a byte-for-byte copy of an entire disk or partition when the data is guaranteed to be at rest. Once you have the capability of saving and restoring disk images, you have the power to restore configured systems to the state they were in when you shut them down. The technique is the same, regardless of what the primary OS is or how it is configured.

The backup system does not care about the OS, because it is simply copying bytes. You don't need to use OS-dependent backup software, and you don't have to guess which pieces must be backed up and restored and in what order, because you can back up and restore all the pieces at once. You don't have to worry about open or locked files, hard or soft links, or Windows Registry Keys, because again you are simply copying data at rest at a level above (or below, actually) all that complexity.

Our imaging solution involves two components working together over the network: an image server capable of storing multi-gigabyte files that contain the disk images and running network services that make those files network-accessible, and a set of bootable open source OS distributions (distros) that allow you to boot a machine to an alternate OS from floppy or CD and access disk data at a low level while it is at rest.

The backup procedure is as follows:

1. Shut down the system.

2. Boot an alternate OS from the distro CD.

3. Transfer the primary OS image from the local disk to a file on the image server.

4. Remove the distro CD and reboot to the primary OS image.

The restore procedure is the same except the direction of data transfer is reversed. What could be simpler?

The Image Server Running FTP, NFS, and CIFS-Samba

To get started, you'll need a place to store your images and a way to easily get them in and out of the repository. For our setup, we took advantage of today's high-capacity drives and the volume management and server software that Linux offers to build a low-cost image repository. The key components of our image server were platform, storage, and network services.

For platform, we refurbished a retired mail server that had a RAID SCSI backplane for up to five drives. Since performance was not a significant requirement, this was a great way to get some additional value from retired hardware. One downside of this decision was that we had some difficulties getting newer versions of Linux to run on this hardware, so we opted for Red Hat 8 as a base.

For storage, a best practice of storage management -- or any IT project -- is to start with your data's requirements:

  • Enough space to hold roughly 50, 5-GB images
  • Capable of storing multi-gigabyte files
  • Most efficient streaming long streams of data
  • Ability to grow without having to move data
  • Recoverability is not critical
  • Impervious to single-disk failure
  • Inexpensive
And match them to a storage solution:

  • 250GB SCSI drives
  • Hardware RAID5 backplane
  • Logical Volume Managed (LVM) Volume
  • ext3 file system with a 4096-byte block size
  • Not backed up
Large, inexpensive drives combined through RAID and LVM provided lots of space with tolerance for disk failure. The volume manager and file system provided the desired expandability. A larger file system block size should help streaming performance. Choosing a journaling file system cuts down on file system repair time in the event that corruption is suspected, especially on large file systems. Backups are not critical because the data rarely changes, can be recovered in other ways in the event of a failure (reinstall), and there is too much data to back up in any reasonable amount of time or number of tapes anyway.

You'll also need network services to make this repository available to the backup clients. We chose vsftpd as our FTP server for its simplicity and security. We also installed nfsd and exported our big file system as a share. Though it's not necessary for the image backup schemes described here, we also installed Samba so CIFS clients could take advantage of the image server's storage resources when performing more conventional disk-to-disk backups. To limit access, we configured the onboard netfilter/iptables firewall.

A Disk Image Solution for x86: g4u and FTP

For x86-based machines running Linux, Windows, Solaris, or BSD, g4u is an application built for imaging over the network. g4u ("ghost for unix") is a simple, elegant, and well-documented tool that boots a NetBSD kernel on x86 platforms and allows you to back up entire disks or disk partitions to a file on an FTP server. It was developed by Hubert Feyrer and is available under the BSD license from:

http://www.feyrer.de/g4u/ 
            
Hubert says development on g4u is active, and he is looking for sponsors. Unlike bootable Linux distributions, g4u is tailored to the job of performing image backups. We used version 2.0.

Once g4u boots, you are provided with a limited command set to list the disks and partitions it finds, and upload or slurp those partitions to the FTP server. To back up a disk wd0 on host myhost as an anonymous user on image.server.com to the file /pub/backups/wd0_myhost-01_04-29-06.gz, the command is:

%>uploaddisk anonymous@image.server.com pub/backups/ \
  wd0_myhost-01_04-29-06.gz wd0 
To restore that same disk, do:

%>slurpdisk anonymous@image.server.com pub/backups/ \
  wd0_myhost-01_04-29-06.gz wd0 
In creating backup image files, we used a file-naming convention that would always identify the image by the host and partition from which it came, along with the date on which the image was made. Note that g4u uses BSD-style device names, so it can be a bit tricky to identify the particular device you want to image.

A Disk Image Solution for SPARC and Power PC: Bootable Gentoo and NFS

Our test lab also has SPARC-based Sun Ultras running Solaris and PowerPC-based IBM P5s running AIX. g4u 2.0 does not run on these architectures, but bootable Gentoo Linux does.

Gentoo Linux is a full Linux distribution optimized for customizability. It comes on bootable Install CDs from:

http://www.gentoo.org 
and it supports a variety of architectures, including alpha, amd64, ppc, sparc and x86. For the purposes of performing image backups, we found the MinimalCD/InstallCD to be adequate. We used the 64-bit version 2005.1.

Simply boot the Gentoo Linux Install CD to get to a root prompt. That takes care of the alternate boot OS. For image backup and restore, we used NFS and the dd and gzip utilities to work with raw devices. Once the OS boots, open a root console and mount the NFS share of the backup file system on the image server with a command such as:

%>mount -t nfs -o soft image.server.com: /export/data/pub/ \
  backups /mnt/gentoo &
Use the fdisk utility to list disks and partitions. These will use Linux device naming conventions. Now, to back up disk /dev/sda on host myhost to a file sda_myhost-04-29-06.gz, we combine the gzip and dd commands as follows:

%>gzip < /dev/sda | dd of=/mnt/gentoo/sda_myhost-04-29-06.gz 
    
The restore command is just as straightforward:

%>gunzip < /mnt/gentoo/sda_myhost-04-29-06.gz | dd of=/dev/sda 
So many of the joys of Unix all in one line.

Image backups are limited to the backup of entire disks or partitions. How you configure your primary OS on your disks can affect the ease and efficiency with which image backup is implemented. To optimize for image backup, you want to keep the number of pieces you have to restore to a minimum -- preferably one. Remember, it is easy to keep one thing self-consistent and harder to keep multiple things consistent with each other.

Although it flies in the face of conventional Unix wisdom, it is best here to keep all the parts of the OS file system (e.g., /etc, /var, /usr) on a single disk or partition that you can restore in one command. The other consideration is the size of the OS partition. The techniques described copy all the bytes on the disk or partition regardless of whether those bytes contain OS data. Putting the OS on a partition that is just large enough to hold it will help to reduce your storage space and transfer times.

Another consideration involves updates and patches. Unless you are able to create a new image every time you apply updates, your images will become out of date, and the images you restore will need to have updates applied. If these limitations in dealing with a growing or changing base install do not meet your requirements, then image backup may not be the right solution for you. Or, you may need to combine it with other techniques. Other pitfalls include hardware that may not be supported by the distro CD. In this case, try another distro. These techniques should work with other bootable CDs such as Knoppix.

When Image Backups Saved the Day

About a year ago, Bill was building a Windows cluster that used a SAN for shared storage between the nodes. Installing device drivers on Windows can be dicey. This time, it led to a "Blue Screen of Death" that prevented the system from booting. With an image of the system as a safety net, he was able to experiment with different installation techniques until he found one that worked. Another time, one of the engineers made some configuration changes we could not correctly reverse. We restored the system from image and were back in business in less than an hour.

A few weeks before writing this article, our Linux build machine started acting strangely. It had a relatively old IDE drive that was starting to exhibit I/O errors. We had not made a recent backup of the file systems because, again, the data was recoverable, mainly just OS and compiler and source control software. Using an image backup we had made for convenience, we were able to replace the failing drive, create partitions on the new drive, and restore the OS partition from the image. The only rub was that the new drive would not boot because the master boot record was not restored along with the partition. With the OS files restored to the partition, however, it was possible to boot the system from the primary OS installation CD in repair mode, run grub and restore the boot record. We were back in business with a new boot disk in about two hours.

Not Quite Bare-metal Recovery, but Close

Bare-metal recovery is the art of bringing a system to a previous operational state starting with backups and unformatted disks. As just described, image backup can be an important component of a bare-metal recovery scheme, but it is not practical as the only backup method, because it requires bringing down the system and all the services it provides every time you make a backup. By separating the systems software data that does not change from the data that does change, you can combine image backup and file system backup to form a bare-metal recovery scheme. To do this, restore the OS and software from image to bring the system back to a working state and then restore your dynamic data using conventional online backup-restore techniques.

Conclusion

When developing any backup and restore technique, be sure to test your procedure on non-critical systems in your environment before you rely on it in production. Modern, high-capacity drives and open source software make it possible to build an inexpensive image backup and recovery solution that services multiple platforms. An easy-to-use solution like this provides an important safety net in any test or production environment and makes it possible to restore systems to known configurations with a few commands.

Bill, Jon, and Alan developed these techniques at TeraCloud Corp., an Enterprise storage analytics software company dedicated to bringing mainframe storage management best practices to the distributed systems world. You can download their free, open source SAN troubleshooting and I/O performance measurement utilities, fcping and io_profile, from: http://www.teracloud.com/utilities.html. Bill, Jon, and Alan can be reached at: systems_r_up@yahoo.com, jpomeroy@tcloud.com, and alan@alanlavitt.com, respectively.