Multi-Platform Image Backups with Bootable, Open Source Distros
Bill Pierce, Jon Pomeroy, and Alan Lavitt
You've just made a somewhat touchy configuration
change to a black box in your operating system (OS) when it happens --
the system fails to boot. Without image backups, you may face hours of
troubleshooting or re-installation. With image backups, however, five
minutes of commands and some time for the backup image to transfer will put
the OS right back where it was. Image backups provide a safety net along
with the freedom to experiment. We run a multi-platform test lab where it
is important to install software on top of a fresh system that is in a
known, clean state. A system for image backups
allows us to bring systems back to that state
in a snap.
The Image Backup Process
An image backup is a byte-for-byte copy of an entire
disk or partition when the data is guaranteed to be at rest. Once you have
the capability of saving and restoring disk images, you have the power to
restore configured systems to the state they were in when you shut them
down. The technique is the same, regardless of what the primary OS is or
how it is configured.
The backup system does not care about the OS, because
it is simply copying bytes. You don't need to use OS-dependent backup
software, and you don't have to guess which pieces must be backed up
and restored and in what order, because you can back up and restore all the
pieces at once. You don't have to worry about open or locked files,
hard or soft links, or Windows Registry Keys, because again you are simply copying data at rest at a level above (or below,
actually) all that complexity.
Our imaging solution involves two components working
together over the network: an image server capable of storing
multi-gigabyte files that contain the disk images and running network
services that make those files network-accessible, and a set of bootable open source OS distributions (distros) that
allow you to boot a machine to an alternate OS from floppy or CD and access
disk data at a low level while it is at rest.
The backup procedure is as follows:
1. Shut down the system.
2. Boot an alternate OS from the distro CD.
3. Transfer the primary OS image from the local disk
to a file on the image server.
4. Remove the distro CD and reboot to the primary OS
image.
The restore procedure is the same except the direction
of data transfer is reversed. What could be simpler?
The Image Server Running FTP, NFS, and CIFS-Samba
To get started, you'll need a place to store
your images and a way to easily get them in and out of the repository. For
our setup, we took advantage of today's high-capacity drives and the
volume management and server software that Linux offers to build a low-cost
image repository. The key components of our image server were platform,
storage, and network services.
For platform, we refurbished a retired mail server
that had a RAID SCSI backplane for up to five drives. Since performance was
not a significant requirement, this was a great way to get some additional
value from retired hardware. One downside of this decision was that we had
some difficulties getting newer versions of Linux to run on this hardware,
so we opted for Red Hat 8 as a base.
For storage, a best practice of storage management
-- or any IT project -- is to start with your data's
requirements:
- Enough space to hold roughly 50, 5-GB images
- Capable of storing multi-gigabyte files
- Most efficient streaming long streams of
data
- Ability to grow without having to move data
- Recoverability is not critical
- Impervious to single-disk failure
- Inexpensive
And match them to a storage solution:
- 250GB SCSI drives
- Hardware RAID5 backplane
- Logical Volume Managed (LVM) Volume
- ext3 file system with a 4096-byte block size
- Not backed up
Large, inexpensive drives combined through RAID and
LVM provided lots of space with tolerance for
disk failure. The volume manager and file system provided the desired
expandability. A larger file system block size
should help streaming performance. Choosing a journaling file system cuts
down on file system repair time in the event that corruption is suspected,
especially on large file systems. Backups are not critical because the data
rarely changes, can be recovered in other ways in the event of a failure
(reinstall), and there is too much data to back up in any reasonable amount
of time or number of tapes anyway.
You'll also need network services to make this
repository available to the backup clients. We chose vsftpd as our FTP
server for its simplicity and security. We also installed nfsd and exported
our big file system as a share. Though it's not necessary for the
image backup schemes described here, we also installed Samba so CIFS
clients could take advantage of the image server's storage resources
when performing more conventional disk-to-disk backups. To limit access, we
configured the onboard netfilter/iptables firewall.
A Disk Image Solution for x86: g4u and FTP
For x86-based machines running Linux, Windows,
Solaris, or BSD, g4u is an application built
for imaging over the network. g4u ("ghost for unix") is a
simple, elegant, and well-documented tool that
boots a NetBSD kernel on x86 platforms and allows you to back up entire
disks or disk partitions to a file on an FTP server. It was developed by
Hubert Feyrer and is available under the BSD license from:
http://www.feyrer.de/g4u/
Hubert says development on g4u is active, and he is
looking for sponsors. Unlike bootable Linux distributions, g4u is tailored
to the job of performing image backups. We used version 2.0.
Once g4u boots, you are provided with a limited
command set to list the disks and partitions it finds, and upload or slurp those partitions to the FTP server. To back up a disk wd0 on
host myhost as an anonymous user on image.server.com to the file
/pub/backups/wd0_myhost-01_04-29-06.gz, the command is:
%>uploaddisk anonymous@image.server.com pub/backups/ \
wd0_myhost-01_04-29-06.gz wd0
To restore that same disk, do:
%>slurpdisk anonymous@image.server.com pub/backups/ \
wd0_myhost-01_04-29-06.gz wd0
In creating backup image files, we used a file-naming
convention that would always identify the image by the host and partition
from which it came, along with the date on which the image was made. Note
that g4u uses BSD-style device names, so it can be a bit tricky to identify
the particular device you want to image.
A Disk Image Solution for SPARC and Power PC: Bootable Gentoo and NFS
Our test lab also has SPARC-based Sun Ultras running
Solaris and PowerPC-based IBM P5s running AIX. g4u 2.0 does not run on
these architectures, but bootable Gentoo Linux does.
Gentoo Linux is a full Linux distribution optimized
for customizability. It comes on bootable Install CDs from:
http://www.gentoo.org
and it supports a variety of architectures, including
alpha, amd64, ppc, sparc and x86. For the purposes of performing image
backups, we found the MinimalCD/InstallCD to be adequate. We used the
64-bit version 2005.1.
Simply boot the Gentoo Linux Install CD to get to a
root prompt. That takes care of the alternate boot OS. For image backup and
restore, we used NFS and the dd and gzip utilities to work with raw
devices. Once the OS boots, open a root console and mount the NFS share of
the backup file system on the image server with a command such as:
%>mount -t nfs -o soft image.server.com: /export/data/pub/ \
backups /mnt/gentoo &
Use the fdisk utility to list disks and partitions.
These will use Linux device naming conventions. Now, to back up disk
/dev/sda on host myhost to a file sda_myhost-04-29-06.gz, we combine the gzip and dd commands as follows:
%>gzip < /dev/sda | dd of=/mnt/gentoo/sda_myhost-04-29-06.gz
The restore command is just as straightforward:
%>gunzip < /mnt/gentoo/sda_myhost-04-29-06.gz | dd of=/dev/sda
So many of the joys of Unix all in one line.
Image backups are limited to the backup of entire
disks or partitions. How you configure your primary OS on your disks can
affect the ease and efficiency with which image backup is implemented. To
optimize for image backup, you want to keep the number of pieces you have
to restore to a minimum -- preferably one. Remember, it is easy to
keep one thing self-consistent and harder to keep multiple things
consistent with each other.
Although it flies in the face of conventional Unix
wisdom, it is best here to keep all the parts of the OS file system (e.g.,
/etc, /var, /usr) on a single disk or partition that you can restore in one
command. The other consideration is the size of the OS partition. The
techniques described copy all the bytes on the disk or partition regardless
of whether those bytes contain OS data. Putting the OS on a partition that
is just large enough to hold it will help to reduce your storage space and
transfer times.
Another consideration involves updates and patches.
Unless you are able to create a new image every time you apply updates,
your images will become out of date, and the images you restore will need
to have updates applied. If these limitations in dealing with a growing or
changing base install do not meet your requirements, then image backup may
not be the right solution for you. Or, you may need to combine it with
other techniques. Other pitfalls include hardware that may not be supported
by the distro CD. In this case, try another distro. These techniques should
work with other bootable CDs such as Knoppix.
When Image Backups Saved the Day
About a year ago, Bill was building a Windows cluster
that used a SAN for shared storage between the nodes. Installing device
drivers on Windows can be dicey. This time, it led to a "Blue Screen
of Death" that prevented the system from booting. With an image of
the system as a safety net, he was able to experiment with different
installation techniques until he found one that worked. Another time, one
of the engineers made some configuration changes we could not correctly
reverse. We restored the system from image and were back in business in
less than an hour.
A few weeks before writing this article, our Linux
build machine started acting strangely. It had a relatively old IDE drive
that was starting to exhibit I/O errors. We had not made a recent backup of
the file systems because, again, the data was recoverable, mainly just OS
and compiler and source control software. Using an image backup we had made
for convenience, we were able to replace the failing drive, create
partitions on the new drive, and restore the OS partition from the image.
The only rub was that the new drive would not boot because the master boot
record was not restored along with the partition. With the OS files
restored to the partition, however, it was possible to boot the system from
the primary OS installation CD in repair mode, run grub and restore the
boot record. We were back in business with a new boot disk in about two
hours.
Not Quite Bare-metal Recovery, but Close
Bare-metal recovery is the art of bringing a system to
a previous operational state starting with backups and unformatted disks.
As just described, image backup can be an important component of a
bare-metal recovery scheme, but it is not practical as the only backup
method, because it requires bringing down the system and all the services
it provides every time you make a backup. By separating the systems
software data that does not change from the data that does change, you can
combine image backup and file system backup to form a bare-metal recovery
scheme. To do this, restore the OS and software from image to bring the
system back to a working state and then restore your dynamic data using
conventional online backup-restore techniques.
Conclusion
When developing any backup and restore technique, be
sure to test your procedure on non-critical systems in your environment
before you rely on it in production. Modern, high-capacity drives and open
source software make it possible to build an inexpensive image backup and
recovery solution that services multiple platforms. An easy-to-use solution
like this provides an important safety net in any test or production
environment and makes it possible to restore systems to known
configurations with a few commands.
Bill, Jon, and Alan developed these techniques at
TeraCloud Corp., an Enterprise storage analytics software company dedicated
to bringing mainframe storage management best practices to the distributed
systems world. You can download their free, open source SAN troubleshooting
and I/O performance measurement utilities, fcping and io_profile, from: http://www.teracloud.com/utilities.html.
Bill, Jon, and Alan can be reached at: systems_r_up@yahoo.com, jpomeroy@tcloud.com, and alan@alanlavitt.com, respectively.
|