Shadow and Snap: A Simple Online Recovery System for
SolarisTM Systems
Kelley Shaw
When topics such as disaster recovery, business
continuity, and disk-to-disk backups are discussed, the conversations
usually revolve around expensive products and complicated solutions. While
these solutions are appropriate for some environments, they are overkill
for others. There are situations where some degree of data protection and
online recovery are desired, but a simple and inexpensive (or free)
solution is more realistic. For example, you may administer a small number
of Solaris servers used for internal services such as Jumpstart, or SunRay
services for temporary employees or guests, or a Solaris desktop system.
While these services are important, your business is not dead in the water
if there is an interruption in service. If a desktop system fails, for
example, only one user is affected. While you hope that one user is not
your boss, typically a short amount of downtime for a single user is not a
huge issue.
I have constructed a simple online recovery system
for Solaris servers or desktops with internal boot disks using only
standard Solaris tools. It provides online recovery from disk failure and
data corruption, including user error. The only hardware required is a
dedicated secondary disk that is used to "shadow" (not mirror)
the boot (primary) disk. The only software required is already included in
Solaris. It is important to note that my system is designed to augment, not
replace standard tape backup or offsite replication procedures. In the case
of total system loss (e.g., fire), offsite tapes would be required to
restore the system.
Better than a Mirror
One standard disaster recovery solution utilized by
many Solaris administrators is disk mirroring. They simply mirror their
primary disks to secondary disks. This is extremely easy to do using the
Solaris Volume Manager (previously named Solstice
DiskSuite), which is included in Solaris. If the primary disk were to fail, the system would panic and reboot off of
the secondary disk. In the time that it takes your system to reboot, you
are back in business! The problem with disk mirroring is that it only
protects against primary disk failure. It does not address data corruption or user error. For example, if you
discover that the mega patch cluster you just applied to your system has caused problems on your system, the mirrored disk
will not help; the patch cluster was applied to it at the exact same
instant it was applied it to the primary disk. Or, if you accidentally
delete a file or directory, that action is immediately mirrored on the
secondary disk.
My solution comprises two parts: shadow and snap.
First, I maintain a "delayed mirror," or shadow disk (a known
good bootable copy of the primary disk). In the case of primary disk
failure or major data corruption (e.g., unsuccessful patch cluster or
application upgrade), recovery is accomplished by booting off of the
secondary disk. Second, I provide a way to perform quick restores for minor
data corruption (e.g., accidental file/directory deletion or unwanted file
modification). This is provided using online, hourly
"snapshots" of the primary disk filesystems. Files and
directories can be restored by simply copying them from the appropriate
snapshot directory. Keep in mind that you don't have to implement
both the shadow and snap solutions; each solution is independent of the
other.
The "shadow-and-snap" solutions use
standard Solaris tools: fssnap, ufsdump, ufsrestore, and cron. Most administrators are familiar with ufsdump, ufsrestore, and cron. However, fssnap is probably one of the most
underutilized tools available in Solaris. Many seasoned Solaris
administrators I know have admitted to me that they hadn't even heard
of it. To their surprise, it has been included in Solaris since Solaris 8,
Update 4 (Release 4/01). Both the "shadow" and
"snap" portions of my solution rely heavily on fssnap, which allows the
creation of lightweight, stable, point-in-time "copies" of UFS filesystems. This article will
describe the procedures for creating and maintaining a shadow disk, as well
as managing scheduled snapshots of important filesystems. Several recovery
scenarios are described with step-by-step examples.
Shadow: Protecting Against Disk Failure and Major Data Corruption
Initial Setup
This section is a step-by-step guide for the process
of initially creating a shadow disk from the command line. Once you have
created and tested the shadow disk, you will want an automated way to
update and maintain the shadow. Automating the "Shadow" Process
covers this automation process.
Prepare the Secondary Disk
Listing 1 is the prtvtoc results of the primary disk on my system. For the most
part, this is a standard Solaris layout. But notice that I allocated a
separate partition to be mounted on /extra. Snapshots created by fssnap must reside on a
different filesystem than the filesystem being snapped. Because I will
create snapshots of all my standard Solaris partitions, it is convenient to
have a dedicated partition for snapshots. The size of the snapshot
partition does not need to be large; fssnap is extremely efficient. Snapshots created by fssnap are not copies of
filesystems; they only require space to keep track of changes to the
original filesystem. fssnap is actually saving original filesystem data before it is
changed on the source filesystem, which means that the more activity there
is on the source filesystem, the more file space the snapshot requires.
My system is a SunBlade 150, so the second disk is
the master device on the secondary IDE channel, which translates to a
device file name of c0t2d0s2. To copy the disk layout from the primary disk
to the secondary disk, I use the fmthard command:
# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t2d0s2
The above example will only work if the secondary disk
is the same size and geometry as the primary disk. Otherwise, use the format command to lay out
the secondary disk to match the primary disk as closely as possible. The
partition sizes do not have to match exactly, just make sure that each
partition on your secondary disk is large enough to accommodate the actual
used space on its corresponding filesystem on the primary disk.
Before we can copy files from the primary disk to the
secondary disk, we must ensure that each new partition on the secondary
disk has a newly created filesystem (except the swap partition):
# newfs /dev/rdsk/c0t2d0s0
# newfs /dev/rdsk/c0t2d0s3
# newfs /dev/rdsk/c0t2d0s4
# newfs /dev/rdsk/c0t2d0s6
# newfs /dev/rdsk/c0t2d0s7
Copy Data from Primary to Secondary Disk
We will use ufsdump and ufsrestore to copy data from the primary disk to the secondary disk.
It is a common suggestion in most Solaris documentation that you quiesce a
filesystem and unmount it before using the ufsdump command. This is inconvenient for critical filesystems such
as /, /usr, and /var, which is where fssnap can help. We can use a point-in-time snapshot of the
filesystem as the source of the ufsdump command rather than the filesystem itself. Remember, fssnap needs a place to
store information about the source filesystem. I create a directory on the
dedicated partition mounted on /extra to hold snapshot information for all
of the snapshots:
# mkdir /extra/snapshots
fssnap can only work
on a single filesystem, so we need to create a separate snapshot of each
filesystem to be copied:
# fssnap -o bs=/extra/snapshots /
/dev/fssnap/0
# fssnap -o bs=/extra/snapshots /usr
/dev/fssnap/1
# fssnap -o bs=/extra/snapshots /var
/dev/fssnap/2
# fssnap -o bs=/extra/snapshots /export/home
/dev/fssnap/3
Notice that the output of each fssnap command is the name of a
device file. This is a "virtual" device that you can use as you
would any other disk device file. We will use each returned device file
name as the source of the ufsdump command. The -o bs=
/extra/snapshots option tells fssnap where to keep the
"backing store" file used to keep required original filesystem
data.
If fssnap fails for the root filesystem or for the /usr filesystem
with the error "fssnap: ioctl: error 22: Invalid argument", the
most probable cause is that you are running the ntp daemon on your system. Simply
disable ntp before executing the fssnap command, and then enable ntp when the fssnap command completes.
Finally, we need to create mountpoints on the primary
disk to mount the secondary disk filesystems. These mountpoints will be the
targets of the ufsrestore commands:
# mkdir /extra/root /extra/usr /extra/var /extra/home
# mount /dev/dsk/c0t2d0s0 /extra/root
# mount /dev/dsk/c0t2d0s3 /extra/usr
# mount /dev/dsk/c0t2d0s4 /extra/var
# mount /dev/dsk/c0t2d0s6 /extra/home
Everything is in place, so we can now copy each
filesystem from the primary disk to the corresponding filesystem on the
secondary disk. This is accomplished by piping the results of the ufsdump command to ufsrestore:
# ufsdump 0f - /dev/fssnap/0 | (cd /extra/root && ufsrestore rf -)
# ufsdump 0f - /dev/fssnap/1 | (cd /extra/usr && ufsrestore rf -)
# ufsdump 0f - /dev/fssnap/2 | (cd /extra/var && ufsrestore rf -)
# ufsdump 0f - /dev/fssnap/3 | (cd /extra/home && ufsrestore rf -)
Because the ufsdump commands are using snapshots as the source of backup, you
can safely continue to use your system as the ufsdump and ufsrestore commands operate.
Special Considerations for the Root Filesystem
A couple of extra steps are required for the root
filesystem on the secondary disk. Remember, we want the secondary disk to
be bootable. To accomplish this, we must install the boot block on the
secondary disk:
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /extra/root
We also need to think about the configuration files
in the root filesystem that would be incorrect if the system were to boot
from the secondary disk. For example, the /etc/vfstab file specifies
"hard coded" device file names for local filesystems:
.
.
.
/dev/dsk/c0t0d0s1 - - swap - no -
/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no -
/dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s3 /usr ufs 1 no -
/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /var ufs 1 no -
/dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /export/home ufs 2 yes -
/dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /extra ufs 2 yes -
.
.
.
Another example of a file that uses hard-coded device file names is /etc/dumpadm.conf:
#
# dumpadm.conf
#
# Configuration parameters for system crash dump.
# Do NOT edit this file by hand -- use dumpadm(1m) instead.
#
DUMPADM_DEVICE=/dev/dsk/c0t0d0s1
DUMPADM_SAVDIR=/var/crash/cayenne
DUMPADM_CONTENT=kernel
DUMPADM_ENABLE=yes
The following commands will ensure that these files on
the secondary disk are correct:
# cat /etc/vfstab | sed s/c0t0d0/c0t2d0/g >/extra/root/etc/vfstab
# cat /etc/dumpadm.conf | sed s/c0t0d0/c0t2d0/g \
>/extra/root/etc/dumpadm.conf
You will need to think about additional configuration
files on your system that require similar modifications. On my system, the
above fixes were all that were necessary.
Remove Snapshots and Cleanup
Remember, the longer a snapshot exists, the more
space it requires. The snapshots we created previously are still active,
even though the filesystem copies are complete. Since we no longer need
these snapshots, we can delete them using the "-d" options of
the fssnap command:
# fssnap -d /
# fssnap -d /usr
# fssnap -d /var
# fssnap -d /export/home
The fssnap -d command does not delete the backing store files, so we
need to manually delete them as well:
# rm /export/snapshots/*
We can also safely unmount the filesystems on the
secondary disk:
# umount /extra/root /extra/usr /extra/var /extra/home
At this point, you have a bootable copy of your primary disk.
Configure the OBP
If the situation arises that the primary disk fails,
we want the system to boot from the secondary disk automatically, without
any interaction from us. To prepare for that situation, we need configure
the Open BootPROM (OBP). We must do two things:
1. Create a device alias in the OBP for the secondary
disk boot partition (or use an existing alias if a valid one exists).
2. Modify the boot-device OBP parameter so that the system will try to boot from
primary disk first and, failing that, boot from the secondary disk.
Note that the specific device paths will vary on each
system. You will need to understand the device paths associated with your
system to configure the OBP correctly. To see what the current value of the
OBP boot-device parameter,
enter the following command from the Solaris prompt:
# eeprom boot-device
boot-device=disk net
In this example, the system will attempt to boot from disk first. Failing that, it
will attempt to boot from net. disk and net refer to a device aliases. To list OBP device aliases, you will need to go to the OBP (ok prompt) by entering the command init 0. When you see the ok prompt, enter the command devalias. You will see output similar to:
screen /pci@1f,0/pci@5/SUNW,XVR-100@1
mouse /pci@1f,0/usb@c,3/mouse@2
.
.
net /pci@1f,0/network@c,1
.
.
disk /pci@1f,0/ide@d/disk@0,0
disk3 /pci@1f,0/ide@d/disk@3,0
disk2 /pci@1f,0/ide@d/disk@2,0
disk1 /pci@1f,0/ide@d/disk@1,0
disk0 /pci@1f,0/ide@d/disk@0,0
.
.
.
These are all predefined device aliases for this
system. Notice that there are already aliases defined for all possible disk
locations. On my system, the secondary disk is the master device on the
secondary IDE channel, so it is referenced correctly by the disk2 alias. Because there is
already an alias defined that will correctly reference slice 0 on the
secondary disk, I do not need to create a new device alias. I need to
modify boot-device so that the system will attempt to boot first from the
primary disk, then from the secondary disk, and finally, from the network:
ok setenv boot-device disk disk2 net
Test Booting off of Secondary Disk
While we're at the OBP, we can go ahead and
test the secondary disk to make sure we can
boot correctly from it:
ok boot disk2
If the system comes up and you are able to log on
successfully and run your applications, you know that the backup disk is
indeed a reliable disaster recovery disk. Once you have completed testing
the secondary disk to your satisfaction, reboot so that you are using the
primary disk once again.
Automating the "Shadow" Process
The process described in the previous section does the
job of creating a bootable copy of the primary disk. Obviously, we want to
update the backup disk on a regular basis, and we don't want to have
to go through this tedious process every time. How often should we refresh
the secondary disk? It depends. You will have to decide what makes sense
for you. You may only want to back up the primary disk before you perform
patching or application upgrades. That way, if things go terribly wrong,
you know you have a clean, bootable system disk to take you right back to
where you were. You may want to back up the primary disk a little more
often so that you don't lose changes that you make as a result of
every day systems administration.
To make the process of refreshing the secondary disk
easier, I use a simple Unix shell script. The script that I use to perform
the backup from the primary disk to the secondary disk is named
"shadow". See Listing 2.
For simplicity and clarity, the filesystems that are
being shadowed are hard-coded in the script. You could easily modify the
script to "discover" the locally mounted partitions by parsing
/etc/vfstab, or the output of df -F ufs. shadow can be executed from the command line or via cron. Keep in mind that because we
are using fssnap snapshots of the filesystems as the source of the backups, this script can
be executed when the system is in use. However, I recommend that you
execute the script during low-usage times to minimize the performance
impact to current users. As discussed previously, how often you execute the
script is completely up to you. I typically use cron to execute shadow once a week, on Saturday
morning:
# crontab -l
...(stuff left out)
0 8 * * 6 /usr/local/sbin/shadow 2>&1 >> /usr/local/log/shadow
You will notice that, for the most part, shadow simply contains the
commands we executed manually to initially create the shadow disk.
Lines 3-14 require additional explanation. We must
consider the possibility that the system booted from the secondary disk
rather than the primary disk. (See "Scenario 2: Primary Disk
Corruption" below.) In this case, the secondary disk has now become
the primary disk! In the shadow script, we check to see which disk is the current boot
disk (Line 7), and then set the secondary disk to be the
"other" one. Based on which disk is the current boot disk, the
OBP parameter "boot-device" is set accordingly (Lines 11 and
13).
Shadow Disk Recovery Scenarios
This section steps through two scenarios in which
your shadow disk would be used to recover from
unplanned "disasters."
Scenario 1: Primary Disk Failure
If the boot disk on your system fails, your system
will panic and
attempt to reboot. Because you have already configured the OBP to boot off
of the secondary disk in the case of primary disk failure, no further
interaction is required by you. The system will simply boot from the
secondary disk. However, you will want to prevent the shadow script from running until you
have replaced the failed disk. If you previously scheduled shadow to run via cron, delete it from your
crontab. Once you have replaced the failed disk, partition it as described
in "Prepare the Secondary Disk", but this time, copy the label
information from the secondary disk to the primary disk:
# prtvtoc /dev/rdsk/c0t2d0s2 | fmthard -s - /dev/rdsk/c0t0d0s2
Run shadow manually to copy the filesystems from the secondary
disk to the new disk:
# /usr/local/sbin/shadow
Remember, shadow is smart enough to know that your secondary disk is acting
as the primary disk and will copy in the correct direction. After shadow completes, test the new
disk by booting off it manually. You can specify disk on the reboot command,
which will be passed to the boot program upon restart:
# reboot -- disk
After the system reboots, test the new disk to ensure
that you can log in and run your applications. If all is well, you can
restore the crontab to execute shadow on a regular basis.
Scenario 2: Primary Disk Corruption
You want to install the latest patch cluster available
for your system. As a precaution, you execute shadow manually to ensure your shadow disk is a clean backup
of your system before you apply the patch:
# /usr/local/sbin/shadow
# cd /var/tmp/9_Recommended
# ./install_cluster
After installing the cluster and rebooting your
system, you notice peculiar behavior in some of your applications. You
decide that you want to back out of the patch cluster. Because you ran shadow before applying the
patch cluster, this is as easy as rebooting. You can specify disk2 on the reboot
command, which will be passed to the boot program upon restart:
# reboot -- disk2
After the system reboots, the secondary disk is now
the primary disk! You are back in business. Because the shadow script checks to see which
disk is the boot disk and makes adjustments accordingly, you can leave shadow in the crontab and
no further action is required.
Snap: Quick Restores for Minor Data Corruption
The "shadow" solution outlined in the
previous section protects you in the case of primary disk failure or major
data corruption. In the case of minor data corruption or accidental file
deletion, a much simpler solution can be implemented; in fact, it's a
"snap". This solution also utilizes the fssnap command. For the snap solution, we will create
fresh snapshots of important filesystems on a regular basis. (One
restriction of fssnap is that you can only create one snapshot of a given filesystem
at a time. So when you create a new snapshot of a filesystem, you must
delete any existing snapshot of that filesystem first.) The snap script is shown in
Listing 3.
Lines 4-12 are the commands to unmount and delete the
existing snapshots of our important filesystems. The commands on lines
15-18 re-recreate the snapshots and mount them
for easy access.
This is the crontab that I use for creating snapshots:
# crontab -l
...(stuff left out)
0 7-17 * * 1-5 /usr/local/sbin/snap 2>&1 >> /usr/local/log/snap
Notice that I create hourly snapshots during regular business hours.
To view existing snapshots, use the 'fssnap -i' command:
# fssnap -i
0 /export/home
1 /var
2 /usr
3 /
To get even more detail about the current snapshots,
use the /usr/lib/fs/ufs/fssnap -i command shown in Listing 4. You can see from Listing 4
that there are four active snapshots in this example, all created at
approximately 3p.m. You can also see from this listing how much disk space
each snapshot is occupying if you look at the value for "Backing
store size". This number will be very small in comparison to the
actual filesystem, for reasons already discussed.
Snap Recovery Scenario: Accidental File Deletion
With the active snapshots mounted, you can easily
recover a file if it is accidentally deleted. For example, let's say
you accidentally deleted the /etc/passwd file.
# rm /etc/passwd
Assuming you realize your mistake, you can immediately
get it back using the snapshot of the root filesystem, which is mounted on /extra/root:
# cp -p /extra/root/etc/passwd /etc/passwd
The "-p" option will restore ownership,
permissions, and modification/access times of the file being copied. If you
don't realize until a couple of hours later that you deleted
/etc/passwd, you will not be able to recover it from the snapshot. This is
because the crontab would have kicked in to create another snapshot that
would be a snapshot of the root filesystem after you deleted the
/etc/passwd file. However, all is not lost if you are also using the
"shadow" solution described previously. In this situation, you
can temporarily mount the backup root filesystem on the secondary disk and
recover the file. For example:
# mount /dev/dsk/c0t2d0s0 /mnt
# cp -p /mnt/etc/passwd /etc/passwd
# umount /mnt
Summary
The shadow and snap solutions outlined in this
article are two ways in which you can provide simple, online data recovery
using system tools available to you as a Solaris administrator. With these
solutions in place, you can quickly recover from disk failure, data
corruption, or user error. These solutions rely heavily on the
underutilized fssnapcommand. For more information about fssnap, see the fssnap and fssnap_ufs man pages.
Kelley Shaw is a Systems Engineer at Commercial Data
Systems, a specialized system integrator with unique solutions in the areas
of multi-level security and diskless boot. Kelley has 20 years of hands-on
Unix administration experience and also teaches Solaris systems
administration classes. Kelley can be reached at: kshaw@cdsinc.com.
|