RAIDing FreeBSD with GEOM
Stephen Corbesero
I recently set up a new file server at my home. I have always been a fan of FreeBSD,
and my current file server is a FreeBSD 4.11 system that uses the vinum system to mirror two ATA133
40-GB drives for part of the system area and all of the data storage. The motherboard on my new system
has support for the typical four (ATA-100) devices, but it also had support for two SATA-300 drives.
I had been eager to check out these new SATA drives, so this was a perfect opportunity.
Over the years and through many versions, FreeBSD has provided support
for different hardware and software RAID systems. I generally prefer software RAID solutions
because they are often more flexible. Besides hardware RAID on specific disk controllers, there
are several software RAID solutions on FreeBSD. Remember that RAID stands for Redundant Array
of Independent Disks, and the most common levels are RAID-0 (striping), RAID-1 (mirroring), RAID-5
(distributed data plus parity), and RAID-10 (striping plus mirroring). I tend to prefer RAID-1
on systems with fewer than four drives and RAID-10 with four or more drives.
Important safety note: Having your system and data areas protected
by RAID is no reason to avoid backups. RAID can only protect your system from a routine drive failure.
A catastrophic event like a nearby lightning strike or flood will cheerfully destroy all your mirrors
simultaneously. Maintain the discipline of a regular backup strategy including off-site archiving!
- ccd: This is the concatenated disk driver, and it has
been in FreeBSD since the mid-90s [1]. It is a standard pseudo-device driver that can be added to
the kernel to provide simple disk striping and mirroring. I used the ccd device to create
a large, fast news spool for an ISP. It did not have many frills, but it worked great!
- GEOM: GEOM is the latest abstraction layer for the FreeBSD
disk system, which first appeared in FreeBSD 5 [2]. The GEOM is dubbed as a "Modular Disk Transformation
Framework." It provides a set of stackable drivers that can act as provider or consumer classes
for storage objects. With this new and flexible framework, "geom classes" can be linked
together to provide various services. Besides the RAID classes like striping and mirroring, there
are also classes to support disk label management and encrypted file systems. However, it is important
to realize that the fundamental GEOM framework does not provide a complete set of logical volume
management (LVM) services. For that level of abstraction, you should consider vinum.
- vinum and gvinum: inum is a very powerful
logical volume manager originally written by Greg Lehey and was first available in FreeBSD 3 [3].
Somewhat patterned after the Veritas volume manager, this software is very configurable and flexible.
As with ccd, it does disk striping and mirroring, and also supports RAID-5. My current file
server was built using vinum mirroring. gvinum is a rewrite of vinum that
uses the GEOM framework, but it is not well-documented -- there is not even a manual page!
I chose GEOM mirroring for my new file server instead of staying with vinum for several reasons. First, I wanted to get some experience with the new GEOM classes.
Second, vinum predates the new GEOM standard for the FreeBSD disk subsystem, and there
are compatibility issues. With FreeBSD 5.X and 6.X, the new gvinum should be used, but the
lack of documentation was more than enough to stick with raw GEOM classes. Unfortunately, as mentioned
previously, not using gvinum means not having a true LVM, so many operations must be done
manually.
Components
My system is composed of an ASUS Pentium motherboard with a 2.9GHz Celeron
processor, 512MB of RAM, one 120GB ATA-100 drive, two 160GB SATA-300 drives, a CD-ROM drive, case,
power supply, etc.
Let's assume that the FreeBSD drive device assignments for this
configuration are ad0 for the ATA disk drive and ad4 and ad6 for the SATA disk drives. The goal of the
following procedures will be to create a mirrored system volume across ad0, ad4, and ad6.
Planning
Before I undertook this project, I started by looking for references and
documentation that might give me a good direction along with a set of instructions. The FreeBSD
Handbook [4] has an entire chapter dedicated to the new GEOM framework [5]. Although this chapter
contains a great deal of useful information, reading it did not make me feel confident enough to
try it. After some Web searching, I found two very good articles online. The first one was titled
"FreeBSD Disk Mirroring" [6]. This was a good reference, but the article approached
the problem with a shell script-like solution.
The other article was titled "Using Software RAID-1 with FreeBSD"
[7]. This was also a good article, but it only presented the whole-disk-mirror approach, which
I find to be too restrictive. However, this article did suggest an interesting but undocumented
direct technique to convert an existing and populated slice into a GEOM gmirror. All the GEOM classes
require a block of bookkeeping information that is placed at the end of the object, which could theoretically
be a single slice, an entire partition, or even the whole drive.
I had concerns about how the file system would react to a block being taken
away after the file system was created. Also, the article applied this technique after partitioning
the entire drive as a single FreeBSD slice. This affects the initial offset of the disk label in the
slice. I experimented with this approach for multiple slices, but it did not work well. I had problems
with partition offsets within my slices, and I had to abandon it. However, if the configuration
of a single slice is acceptable for your application, I would still recommend trying it, because
it saves a considerable amount of data juggling. Since neither of the previous approaches worked
out of the box for me, I developed the following procedure as an amalgam of the above with a dash of
my own philosophies on disk administration.
When setting up any new system, it is always best to spend time planning
before installing. Some very important choices must be made that will affect how the actual disk
space will be allocated and configured. One of the most important choices is that of using single
or multiple slices for the disks. The GEOM classes make it possible to mirror entire disks or individual
slices of disks. Although mirroring the entire drive seems appealing for its apparent simplicity,
it is not very flexible. For example, I was concerned about replacing a drive if one should fail in
the future. Would I need to find an exact replacement? If I were forced to replace a failed drive with
a larger one, would I lose the extra space? By applying GEOM to individual slices, I would be free
to add slices that might not need to be mirrored.
Questions like these led me to the decision that I would have at least
two slices that would each be under GEOM control. One slice would be just for the operating system,
and the other would be for a data area such as home directories. There are many advantages to multiple-
rather than single-slice configurations. The single most significant advantage is flexibility.
By having a separate data slice, operating system upgrades and backups can be simpler. Also, there
is a much smaller chance of running out of partition letters.
Another advantage is the possibility of having multiple versions of
FreeBSD on separate slices, so you can experiment with different releases on their own slices without
having to overwrite or potentially damage an installation. Such experimental slices might not
even need to be mirrored, because they are likely to be transient.
With this layout in mind, the most important task would be to get the system
sized, installed, and mirrored. Unfortunately, the FreeBSD install program, sysinstall, still
does not provide an option for mirroring at install time. To accomplish our goal, the procedure
will involve a three-phase process of installing a base system, creating a single-drive "mirror"
of it, and finally destroying the original system to make the system fully mirrored. At this point,
additional system or data slices, mirrored, striped, or not, can easily be added.
The next step is to decide on a size for the system volume. Table 1 shows
the layout I used. I generally follow very standard allocation rules and try to leave sufficient
space for expansion.
Procedures
From this point on, it is assumed that you are comfortable with BSD commands
and procedures in general as well as with using sysinstall to install a new system or modify
an existing one.
Phase 1: Base Installation
Perform a standard installation of FreeBSD on the first slice of ad0. If using
my layout, size this slice at 30 GB and make note of the number of sectors. Be sure to install the standard
FreeBSD boot manager.
Do not get very attached to the result of this install, because there
is always a chance you may have to do it again. I went through the complete installation but did not
bother to load the ports collection, configure X, or create any user accounts. After the install
was finished, I must confess that I did install emacs and bash from the installation CD-ROM, but
those are two packages that I just cannot live without. After the install finishes, remove the CD-ROM,
and reboot from the hard drive. You should have a minimal but fully functional system.
Phase 2: Create the First System Mirror
As root, use sysinstall to put a slice on the second drive, ad4, which
is the same size as the system slice on the first drive. Be sure to also let sysinstall put
the boot manager there as well. If sysinstall can't write the slice information,
run the sysctl command (below) to enable GEOM writes and then run sysinstall again.
GEOM uses the end of a slice or disk to store its configuration metadata, so you will actually lose
a small amount of storage. Unless you tried to size your system volume down to an exact number of sectors,
this should not be an issue. Let us assume that this slice becomes ad4s1. Use the following commands
to start the process of creating the first mirror component.
gmirror load
sysctl kern.geom.debugflags=16
gmirror label -v -b round-robin sys0 /dev/ad4s1
The first command loads the GEOM mirror module into memory. The second command
is equally, if not more important. It disables a kernel safety feature that will normally prevent
altering GEOM data on a disk that is currently mounted for writing. This may not be needed just yet,
but it definitely will be needed once we start creating file systems. Whenever you're working
with low-level GEOM operations like fdisk and bsdlabel, if you get an error message
that the operation was not permitted or a write failed, you probably need to set this kernel variable
back to 16 for the command(s) to succeed. This variable is initialized to 0 at system startup so do
not worry about resetting it.
There is one additional caveat. If you happen to create a slice that goes
all the way to the end of the disk drive, GEOM will get confused and think that the entire disk is under
the control of a single GEOM class, as opposed to each slice being controlled by independent GEOM
classes. To avoid this, make sure the last slice does not go all the way to the end. Leaving the last
sector of the drive out of the slice is enough for GEOM to get it right.
The third command creates the mirror, sets the I/O scheduling to be round
robin, and names the new "device" /dev/mirror/sys0. I chose the name "sys"
so that it would be descriptive and added the trailing 0 to make it somewhat similar to most disk naming
conventions. It is very important from this point on to refer to all disk operations by the logical mirror name, /dev/mirror/sys0, and not to use the physical /dev/ad4s1 name. Any operations
on /dev/ad4s1 could seriously hurt the configuration, and you would likely need to start over.
You now need to create the file systems inside the mirror. Unfortunately, sysinstall does not know about mirrored drives, so you have to put a bootstrap on the slice
and edit the disk label manually. The command bsdlabel -wB /dev/mirror/sys0 will initialize
the mirror with a bootstrap and a minimal disk label. Next use the command bsdlabel -e /dev/mirror/sys0 to make the file system partitions. This may take a few tries, so I set the EDITOR environment variable
to be emacs for my comfort, otherwise it would default to vi.
Luckily, the bsdlabel command has become more user-friendly
over the years. When editing the slice with bsdlabel, it is now possible to specify partition
sizes in units like mega- and gigabytes instead of just sectors. Also, the * can be used as a place
holder for bsdlabel to compute the partition offsets automatically. The following commands
will write a bootstrap for the mirror, and then let you edit (create) a label. I urge you to read the
man page for bsdlabel.
According to the man page, you should be able to set the label to what is
shown in Figure 1. Do not change the c partition! Theoretically, when you leave the editor, bsdlabel will correctly determine the values of the asterisks for the offsets. However, I had a problem with
the last partition. bsdlabel could not figure out the size of the last partition to replace
that asterisk. I had done the label in two steps. In the first edit, I created the last partition at
a small size. When I did the second edit, I could see the various sector sizes, and I could subtract
the sector offset of the last partition from the total number of sectors in the entire slice (from
partition c) to determine the real (and maximum) size (in sectors) for the f partitions.
If the labeling went well, you should now be able to make all the file systems
using newfs commands and then check them with fsck:
for part in a d e f; do
newfs -U /dev/mirror/sys0$part
fsck /dev/mirror/sys0$part
done
Copy the data from your boot disk file systems to the sys0 mirror. You can use
dump and restore for this, but I always use cpio in pass mode. For example, to copy the root system,
use the following commands:
cd /
mount /dev/mirror/sys0a /mnt
find -x . -print | cpio -pmud /mnt
umount /mnt
Then, repeat the above sequence using the other file systems names and partitions:
/var to sys0d, /usr to sys0e, and (if you did install ports) /usr/ports to sys0f. Note that it is possible
to mount all the mirror file systems under /mnt and do one mass copy, but I prefer doing one at a time
just to be safe.
There are some minor configuration files that must now be edited to make
the mirror the actual boot drive. The /etc/fstab on the mirror must refer to mirrored file systems
and not the ad0 file systems. Remount /dev/mirror/sys0a on /mnt and then edit /mnt/etc/fstab.
Change all occurrences of /dev/ad0 to /dev/mirror/sys0, keeping the partition letters the same.
Also, make sure that the GEOM mirroring module will be loaded at boot
time and that the loader will correctly read the kernel from the mirrored slice. The following commands
will correctly set the necessary configuration on both the mirrored and unmirrored slices:
# echo 'geom_mirror_load="YES"' > /boot/loader.conf
# echo '1:ad(4,a)/boot/loader' > /boot/boot.config
# echo 'geom_mirror_load="YES"' > /mnt/boot/loader.conf
# echo '1:ad(4,a)/boot/loader' > /mnt/boot/boot.config
At this point, your mirror should be fully functional and ready to be the system
boot device. You can unmount the mirrored root, do a couple of syncs, and then reboot.
Watch the boot messages; you should see messages about the GEOM mirror
module being loaded and the root going to /dev/mirror/sys0a. If not, you are still in pretty good
shape. You can boot from the system on ad0a manually by telling the loader. At that point, check the
/etc/fstab on the mirrored volume and the boot files to make sure they are correct. You may wish to fsck the mirrored file systems to make sure they are intact.
Phase 3: Complete the System Mirror
If your system has booted from the mirror, you can now finish the job by creating
additional mirrors of the system volume. This is actually the easy part, but you do have some choices.
Again, become root. Do a df to make sure that nothing is mounted
from ad0. If so, double-check that that the size of ad0s1 and ad4s1 are the exact same number of sectors.
You can use fdisk to read these values. If they are not the same size, use sysinstall to erase and recreate the ad0s1 to be exactly the same size in sectors as ad4s1! Once this is the case,
add that partition to the mirror with the following command:
# gmirror insert sys0 /dev/ad0s1
At this point, the mirroring module will automatically begin copying data
from the source on ad4s1 to the new mirror component until the two are fully synchronized. This can
take quite a while if you have chosen a large slice size. You can monitor the progress with the gmirror status and gmirror list commands. The output from the status command will include a notation
of DEGRADED to indicate that synchronization is in progress along with a percentage completed
measure. In my installation, I had a third drive, so I created a third mirror for the system volume
by making the ad6s1 slice and then adding it with another gmirror insert command.
Congratulations. At this point, your system is mirrored and ready to
go. You may now choose to create one or more mirrored data slices. Data slices are much simpler. For
example, I created a slice to hold file systems for my home and data directories using the following
sequence. I also chose to mirror these only on the SATA drives and not the ad0 drive:
1. Use sysinstall or fdisk to create an s2 slice on drives
ad4 and ad6. You will undoubtedly need to issue the sysctl command to allow writing on the
GEOM provider.
2. Use gmirror label -v -b round-robin data0 /dev/ad4s2 to build
the first component of the mirror.
3. Use bsdlabel -w /dev/mirror/data0 and bsdlabel -e /dev/mirror/data0 commands to create an initial label and to lay out the partitions for the file systems.
4. Use the newfs and fsck commands to make and check the
new file systems.
5. Use gmirror insert data0 /dev/ad6s2 to add the other drive's
slice to the mirror.
6. Add the file systems you created in the mirror to /etc/fstab, remembering
to use their /dev/mirror/data0x partition names. Don't forget to create their mount points.
My final /etc/fstab is shown in Figure 2.
After the mirrors synchronize, the system should be fully operational.
Enjoy!
Conclusions
This procedure shows a step-by-step account of how I set up a small file server
for my home with basic RAID capabilities. In retrospect, although working with the GEOM mirroring
features was instructional, I would have to say that using gvinum is still probably the
best way to go to get the flexibility of a true logical volume management. Since my system is stable
and completely functional, I doubt that I will change it. The only thing holding me back from using gvinum in a system is the lack of documentation; however, it seems like a good project to
try on a spare system.
References
1. ccd(4): Concatenated disk driver. FreeBSD 6.0 manpage, August 1995. University of Utah.
2. Kamp, P.-H. Geom(4): Modular disk i/o request transformation framework. FreeBSD 6.0 manpage,
March 2002.
3. Lehey, G. Vinum(4): Logical volume manager. FreeBSD 6.0 manpage, May 2002.
4. Various Authors. 1995-2006. The FreeBSD Handbook. http://www.freebsd.org.
5. Rhodes, T. 1995-2006. The FreeBSD Handbook, ch. GEOM: Modular Disk Transformation Framework. http://www.freebsd.org.
6. Engelschall, R. S. 2005 (February). FreeBSD system disk mirroring. Daemonnews. http://www.daemonnews.org.
7. Lavigne, D. 2005 (November). "Using software RAID-1 with FreeBSD". O'Reilly ONLAMP.com. http://www.onlamp.com.
Stephen Corbesero is an associate professor of Computer Science at Moravian
College in Bethlehem, Pennsylvania. His teaching and research interests include operating systems,
systems administration, and networking. He also lives in Bethlehem with his dogs, Cursor and Chip,
and his cat, Pixel, who always provide interesting input to the software development process.
|