Article Listing 1 Listing 2 Listing 3 Listing 4 oct2006.tar

Shadow and Snap: A Simple Online Recovery System for SolarisTM Systems

Kelley Shaw

When topics such as disaster recovery, business continuity, and disk-to-disk backups are discussed, the conversations usually revolve around expensive products and complicated solutions. While these solutions are appropriate for some environments, they are overkill for others. There are situations where some degree of data protection and online recovery are desired, but a simple and inexpensive (or free) solution is more realistic. For example, you may administer a small number of Solaris servers used for internal services such as Jumpstart, or SunRay services for temporary employees or guests, or a Solaris desktop system. While these services are important, your business is not dead in the water if there is an interruption in service. If a desktop system fails, for example, only one user is affected. While you hope that one user is not your boss, typically a short amount of downtime for a single user is not a huge issue.

I have constructed a simple online recovery system for Solaris servers or desktops with internal boot disks using only standard Solaris tools. It provides online recovery from disk failure and data corruption, including user error. The only hardware required is a dedicated secondary disk that is used to "shadow" (not mirror) the boot (primary) disk. The only software required is already included in Solaris. It is important to note that my system is designed to augment, not replace standard tape backup or offsite replication procedures. In the case of total system loss (e.g., fire), offsite tapes would be required to restore the system.

Better than a Mirror

One standard disaster recovery solution utilized by many Solaris administrators is disk mirroring. They simply mirror their primary disks to secondary disks. This is extremely easy to do using the Solaris Volume Manager (previously named Solstice DiskSuite), which is included in Solaris. If the primary disk were to fail, the system would panic and reboot off of the secondary disk. In the time that it takes your system to reboot, you are back in business! The problem with disk mirroring is that it only protects against primary disk failure. It does not address data corruption or user error. For example, if you discover that the mega patch cluster you just applied to your system has caused problems on your system, the mirrored disk will not help; the patch cluster was applied to it at the exact same instant it was applied it to the primary disk. Or, if you accidentally delete a file or directory, that action is immediately mirrored on the secondary disk.

My solution comprises two parts: shadow and snap. First, I maintain a "delayed mirror," or shadow disk (a known good bootable copy of the primary disk). In the case of primary disk failure or major data corruption (e.g., unsuccessful patch cluster or application upgrade), recovery is accomplished by booting off of the secondary disk. Second, I provide a way to perform quick restores for minor data corruption (e.g., accidental file/directory deletion or unwanted file modification). This is provided using online, hourly "snapshots" of the primary disk filesystems. Files and directories can be restored by simply copying them from the appropriate snapshot directory. Keep in mind that you don't have to implement both the shadow and snap solutions; each solution is independent of the other.

The "shadow-and-snap" solutions use standard Solaris tools: fssnap, ufsdump, ufsrestore, and cron. Most administrators are familiar with ufsdump, ufsrestore, and cron. However, fssnap is probably one of the most underutilized tools available in Solaris. Many seasoned Solaris administrators I know have admitted to me that they hadn't even heard of it. To their surprise, it has been included in Solaris since Solaris 8, Update 4 (Release 4/01). Both the "shadow" and "snap" portions of my solution rely heavily on fssnap, which allows the creation of lightweight, stable, point-in-time "copies" of UFS filesystems. This article will describe the procedures for creating and maintaining a shadow disk, as well as managing scheduled snapshots of important filesystems. Several recovery scenarios are described with step-by-step examples.

Shadow: Protecting Against Disk Failure and Major Data Corruption

Initial Setup

This section is a step-by-step guide for the process of initially creating a shadow disk from the command line. Once you have created and tested the shadow disk, you will want an automated way to update and maintain the shadow. Automating the "Shadow" Process covers this automation process.

Prepare the Secondary Disk

Listing 1 is the prtvtoc results of the primary disk on my system. For the most part, this is a standard Solaris layout. But notice that I allocated a separate partition to be mounted on /extra. Snapshots created by fssnap must reside on a different filesystem than the filesystem being snapped. Because I will create snapshots of all my standard Solaris partitions, it is convenient to have a dedicated partition for snapshots. The size of the snapshot partition does not need to be large; fssnap is extremely efficient. Snapshots created by fssnap are not copies of filesystems; they only require space to keep track of changes to the original filesystem. fssnap is actually saving original filesystem data before it is changed on the source filesystem, which means that the more activity there is on the source filesystem, the more file space the snapshot requires.

My system is a SunBlade 150, so the second disk is the master device on the secondary IDE channel, which translates to a device file name of c0t2d0s2. To copy the disk layout from the primary disk to the secondary disk, I use the fmthard command:

# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t2d0s2 
            
The above example will only work if the secondary disk is the same size and geometry as the primary disk. Otherwise, use the format command to lay out the secondary disk to match the primary disk as closely as possible. The partition sizes do not have to match exactly, just make sure that each partition on your secondary disk is large enough to accommodate the actual used space on its corresponding filesystem on the primary disk.

Before we can copy files from the primary disk to the secondary disk, we must ensure that each new partition on the secondary disk has a newly created filesystem (except the swap partition):

# newfs /dev/rdsk/c0t2d0s0 
# newfs /dev/rdsk/c0t2d0s3 
# newfs /dev/rdsk/c0t2d0s4 
# newfs /dev/rdsk/c0t2d0s6 
# newfs /dev/rdsk/c0t2d0s7 
    
Copy Data from Primary to Secondary Disk

We will use ufsdump and ufsrestore to copy data from the primary disk to the secondary disk. It is a common suggestion in most Solaris documentation that you quiesce a filesystem and unmount it before using the ufsdump command. This is inconvenient for critical filesystems such as /, /usr, and /var, which is where fssnap can help. We can use a point-in-time snapshot of the filesystem as the source of the ufsdump command rather than the filesystem itself. Remember, fssnap needs a place to store information about the source filesystem. I create a directory on the dedicated partition mounted on /extra to hold snapshot information for all of the snapshots:

# mkdir /extra/snapshots 
    
fssnap can only work on a single filesystem, so we need to create a separate snapshot of each filesystem to be copied:

# fssnap -o bs=/extra/snapshots / 
/dev/fssnap/0 
# fssnap -o bs=/extra/snapshots /usr 
/dev/fssnap/1 
# fssnap -o bs=/extra/snapshots /var 
/dev/fssnap/2 
# fssnap -o bs=/extra/snapshots /export/home 
/dev/fssnap/3 
    
Notice that the output of each fssnap command is the name of a device file. This is a "virtual" device that you can use as you would any other disk device file. We will use each returned device file name as the source of the ufsdump command. The -o bs= /extra/snapshots option tells fssnap where to keep the "backing store" file used to keep required original filesystem data.

If fssnap fails for the root filesystem or for the /usr filesystem with the error "fssnap: ioctl: error 22: Invalid argument", the most probable cause is that you are running the ntp daemon on your system. Simply disable ntp before executing the fssnap command, and then enable ntp when the fssnap command completes.

Finally, we need to create mountpoints on the primary disk to mount the secondary disk filesystems. These mountpoints will be the targets of the ufsrestore commands:

# mkdir /extra/root /extra/usr /extra/var /extra/home  
# mount /dev/dsk/c0t2d0s0 /extra/root 
# mount /dev/dsk/c0t2d0s3 /extra/usr 
# mount /dev/dsk/c0t2d0s4 /extra/var 
# mount /dev/dsk/c0t2d0s6 /extra/home 
    
Everything is in place, so we can now copy each filesystem from the primary disk to the corresponding filesystem on the secondary disk. This is accomplished by piping the results of the ufsdump command to ufsrestore:

# ufsdump 0f - /dev/fssnap/0 | (cd /extra/root && ufsrestore rf -) 
# ufsdump 0f - /dev/fssnap/1 | (cd /extra/usr && ufsrestore rf -) 
# ufsdump 0f - /dev/fssnap/2 | (cd /extra/var && ufsrestore rf -) 
# ufsdump 0f - /dev/fssnap/3 | (cd /extra/home && ufsrestore rf -) 
    
Because the ufsdump commands are using snapshots as the source of backup, you can safely continue to use your system as the ufsdump and ufsrestore commands operate.

Special Considerations for the Root Filesystem

A couple of extra steps are required for the root filesystem on the secondary disk. Remember, we want the secondary disk to be bootable. To accomplish this, we must install the boot block on the secondary disk:

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /extra/root 
    
We also need to think about the configuration files in the root filesystem that would be incorrect if the system were to boot from the secondary disk. For example, the /etc/vfstab file specifies "hard coded" device file names for local filesystems:

. 
. 
. 
/dev/dsk/c0t0d0s1  -       -       swap  -                         no   - 
/dev/dsk/c0t0d0s0  /dev/rdsk/c0t0d0s0    /              ufs   1    no   - 
/dev/dsk/c0t0d0s3  /dev/rdsk/c0t0d0s3    /usr           ufs   1    no   - 
/dev/dsk/c0t0d0s4  /dev/rdsk/c0t0d0s4    /var           ufs   1    no   - 
/dev/dsk/c0t0d0s6  /dev/rdsk/c0t0d0s6    /export/home   ufs   2    yes  - 
/dev/dsk/c0t0d0s7  /dev/rdsk/c0t0d0s7    /extra         ufs   2    yes  - 
. 
. 
. 
    
Another example of a file that uses hard-coded device file names is /etc/dumpadm.conf:

# 
# dumpadm.conf 
# 
# Configuration parameters for system crash dump. 
# Do NOT edit this file by hand -- use dumpadm(1m) instead. 
# 
DUMPADM_DEVICE=/dev/dsk/c0t0d0s1 
DUMPADM_SAVDIR=/var/crash/cayenne 
DUMPADM_CONTENT=kernel 
DUMPADM_ENABLE=yes 
The following commands will ensure that these files on the secondary disk are correct:

# cat /etc/vfstab | sed s/c0t0d0/c0t2d0/g >/extra/root/etc/vfstab 
# cat /etc/dumpadm.conf | sed s/c0t0d0/c0t2d0/g \
     >/extra/root/etc/dumpadm.conf 
You will need to think about additional configuration files on your system that require similar modifications. On my system, the above fixes were all that were necessary.

Remove Snapshots and Cleanup

Remember, the longer a snapshot exists, the more space it requires. The snapshots we created previously are still active, even though the filesystem copies are complete. Since we no longer need these snapshots, we can delete them using the "-d" options of the fssnap command:

# fssnap -d / 
# fssnap -d /usr 
# fssnap -d /var 
# fssnap -d /export/home 
The fssnap -d command does not delete the backing store files, so we need to manually delete them as well:

# rm /export/snapshots/* 
We can also safely unmount the filesystems on the secondary disk:

# umount /extra/root /extra/usr /extra/var /extra/home 
At this point, you have a bootable copy of your primary disk.

Configure the OBP

If the situation arises that the primary disk fails, we want the system to boot from the secondary disk automatically, without any interaction from us. To prepare for that situation, we need configure the Open BootPROM (OBP). We must do two things:

1. Create a device alias in the OBP for the secondary disk boot partition (or use an existing alias if a valid one exists).

2. Modify the boot-device OBP parameter so that the system will try to boot from primary disk first and, failing that, boot from the secondary disk.

Note that the specific device paths will vary on each system. You will need to understand the device paths associated with your system to configure the OBP correctly. To see what the current value of the OBP boot-device parameter, enter the following command from the Solaris prompt:

 
# eeprom boot-device 
boot-device=disk net 
In this example, the system will attempt to boot from disk first. Failing that, it will attempt to boot from net. disk and net refer to a device aliases. To list OBP device aliases, you will need to go to the OBP (ok prompt) by entering the command init 0. When you see the ok prompt, enter the command devalias. You will see output similar to:
 
screen             /pci@1f,0/pci@5/SUNW,XVR-100@1 
mouse              /pci@1f,0/usb@c,3/mouse@2 
. 

. 
net                /pci@1f,0/network@c,1 
. 
. 
disk               /pci@1f,0/ide@d/disk@0,0 
disk3              /pci@1f,0/ide@d/disk@3,0 
disk2              /pci@1f,0/ide@d/disk@2,0 
disk1              /pci@1f,0/ide@d/disk@1,0 
disk0              /pci@1f,0/ide@d/disk@0,0 
. 
. 
. 
    
These are all predefined device aliases for this system. Notice that there are already aliases defined for all possible disk locations. On my system, the secondary disk is the master device on the secondary IDE channel, so it is referenced correctly by the disk2 alias. Because there is already an alias defined that will correctly reference slice 0 on the secondary disk, I do not need to create a new device alias. I need to modify boot-device so that the system will attempt to boot first from the primary disk, then from the secondary disk, and finally, from the network:

ok setenv boot-device disk disk2 net 
    
Test Booting off of Secondary Disk

While we're at the OBP, we can go ahead and test the secondary disk to make sure we can boot correctly from it:

ok boot disk2 
    
If the system comes up and you are able to log on successfully and run your applications, you know that the backup disk is indeed a reliable disaster recovery disk. Once you have completed testing the secondary disk to your satisfaction, reboot so that you are using the primary disk once again.

Automating the "Shadow" Process

The process described in the previous section does the job of creating a bootable copy of the primary disk. Obviously, we want to update the backup disk on a regular basis, and we don't want to have to go through this tedious process every time. How often should we refresh the secondary disk? It depends. You will have to decide what makes sense for you. You may only want to back up the primary disk before you perform patching or application upgrades. That way, if things go terribly wrong, you know you have a clean, bootable system disk to take you right back to where you were. You may want to back up the primary disk a little more often so that you don't lose changes that you make as a result of every day systems administration.

To make the process of refreshing the secondary disk easier, I use a simple Unix shell script. The script that I use to perform the backup from the primary disk to the secondary disk is named "shadow". See Listing 2.

For simplicity and clarity, the filesystems that are being shadowed are hard-coded in the script. You could easily modify the script to "discover" the locally mounted partitions by parsing /etc/vfstab, or the output of df -F ufs. shadow can be executed from the command line or via cron. Keep in mind that because we are using fssnap snapshots of the filesystems as the source of the backups, this script can be executed when the system is in use. However, I recommend that you execute the script during low-usage times to minimize the performance impact to current users. As discussed previously, how often you execute the script is completely up to you. I typically use cron to execute shadow once a week, on Saturday morning:

 
# crontab -l 
...(stuff left out) 
0 8 * * 6 /usr/local/sbin/shadow 2>&1 >> /usr/local/log/shadow 
You will notice that, for the most part, shadow simply contains the commands we executed manually to initially create the shadow disk.

Lines 3-14 require additional explanation. We must consider the possibility that the system booted from the secondary disk rather than the primary disk. (See "Scenario 2: Primary Disk Corruption" below.) In this case, the secondary disk has now become the primary disk! In the shadow script, we check to see which disk is the current boot disk (Line 7), and then set the secondary disk to be the "other" one. Based on which disk is the current boot disk, the OBP parameter "boot-device" is set accordingly (Lines 11 and 13).

Shadow Disk Recovery Scenarios

This section steps through two scenarios in which your shadow disk would be used to recover from unplanned "disasters."

Scenario 1: Primary Disk Failure

If the boot disk on your system fails, your system will panic and attempt to reboot. Because you have already configured the OBP to boot off of the secondary disk in the case of primary disk failure, no further interaction is required by you. The system will simply boot from the secondary disk. However, you will want to prevent the shadow script from running until you have replaced the failed disk. If you previously scheduled shadow to run via cron, delete it from your crontab. Once you have replaced the failed disk, partition it as described in "Prepare the Secondary Disk", but this time, copy the label information from the secondary disk to the primary disk:

 
# prtvtoc /dev/rdsk/c0t2d0s2 | fmthard -s - /dev/rdsk/c0t0d0s2 
Run shadow manually to copy the filesystems from the secondary disk to the new disk:
 
# /usr/local/sbin/shadow 
Remember, shadow is smart enough to know that your secondary disk is acting as the primary disk and will copy in the correct direction. After shadow completes, test the new disk by booting off it manually. You can specify disk on the reboot command, which will be passed to the boot program upon restart:
 
# reboot -- disk 
    
After the system reboots, test the new disk to ensure that you can log in and run your applications. If all is well, you can restore the crontab to execute shadow on a regular basis.

Scenario 2: Primary Disk Corruption

You want to install the latest patch cluster available for your system. As a precaution, you execute shadow manually to ensure your shadow disk is a clean backup of your system before you apply the patch:

 
# /usr/local/sbin/shadow 
# cd /var/tmp/9_Recommended 
# ./install_cluster 
    
After installing the cluster and rebooting your system, you notice peculiar behavior in some of your applications. You decide that you want to back out of the patch cluster. Because you ran shadow before applying the patch cluster, this is as easy as rebooting. You can specify disk2 on the reboot command, which will be passed to the boot program upon restart:

# reboot -- disk2 
    
After the system reboots, the secondary disk is now the primary disk! You are back in business. Because the shadow script checks to see which disk is the boot disk and makes adjustments accordingly, you can leave shadow in the crontab and no further action is required.

Snap: Quick Restores for Minor Data Corruption

The "shadow" solution outlined in the previous section protects you in the case of primary disk failure or major data corruption. In the case of minor data corruption or accidental file deletion, a much simpler solution can be implemented; in fact, it's a "snap". This solution also utilizes the fssnap command. For the snap solution, we will create fresh snapshots of important filesystems on a regular basis. (One restriction of fssnap is that you can only create one snapshot of a given filesystem at a time. So when you create a new snapshot of a filesystem, you must delete any existing snapshot of that filesystem first.) The snap script is shown in Listing 3.

Lines 4-12 are the commands to unmount and delete the existing snapshots of our important filesystems. The commands on lines 15-18 re-recreate the snapshots and mount them for easy access.

This is the crontab that I use for creating snapshots:

 
# crontab -l 
...(stuff left out) 
0 7-17 * * 1-5 /usr/local/sbin/snap 2>&1 >> /usr/local/log/snap 
    
Notice that I create hourly snapshots during regular business hours.

To view existing snapshots, use the 'fssnap -i' command:

 
# fssnap -i 
   0    /export/home 
   1    /var 
   2    /usr 
   3    / 
To get even more detail about the current snapshots, use the /usr/lib/fs/ufs/fssnap -i command shown in Listing 4. You can see from Listing 4 that there are four active snapshots in this example, all created at approximately 3p.m. You can also see from this listing how much disk space each snapshot is occupying if you look at the value for "Backing store size". This number will be very small in comparison to the actual filesystem, for reasons already discussed.

Snap Recovery Scenario: Accidental File Deletion

With the active snapshots mounted, you can easily recover a file if it is accidentally deleted. For example, let's say you accidentally deleted the /etc/passwd file.

 
# rm /etc/passwd 
    
Assuming you realize your mistake, you can immediately get it back using the snapshot of the root filesystem, which is mounted on /extra/root:
 
# cp -p /extra/root/etc/passwd /etc/passwd 
The "-p" option will restore ownership, permissions, and modification/access times of the file being copied. If you don't realize until a couple of hours later that you deleted /etc/passwd, you will not be able to recover it from the snapshot. This is because the crontab would have kicked in to create another snapshot that would be a snapshot of the root filesystem after you deleted the /etc/passwd file. However, all is not lost if you are also using the "shadow" solution described previously. In this situation, you can temporarily mount the backup root filesystem on the secondary disk and recover the file. For example:

# mount /dev/dsk/c0t2d0s0 /mnt 
# cp -p /mnt/etc/passwd /etc/passwd
# umount /mnt 
    
Summary

The shadow and snap solutions outlined in this article are two ways in which you can provide simple, online data recovery using system tools available to you as a Solaris administrator. With these solutions in place, you can quickly recover from disk failure, data corruption, or user error. These solutions rely heavily on the underutilized fssnapcommand. For more information about fssnap, see the fssnap and fssnap_ufs man pages.

Kelley Shaw is a Systems Engineer at Commercial Data Systems, a specialized system integrator with unique solutions in the areas of multi-level security and diskless boot. Kelley has 20 years of hands-on Unix administration experience and also teaches Solaris systems administration classes. Kelley can be reached at: kshaw@cdsinc.com.