Xen Master is Yum
Faye Gibbins
Xen is a Microsoft-supported, open source, GPL’d
hypervisor for Linux systems developed at the University of Cambridge [1]. It
currently runs on all modern 32- and 64-bit x86 chipsets with SMP, including
dual and quad core or uniprocessor CPUs. Using patches to the kernel (either
versions 2.4.x or 2.6.x), it’s possible to turn a single box into a set of
virtual machines (VMs) each independently running its own Linux distribution.
I’ve been investigating using Xen to securely implement
VMs on our boxes so that we can make more efficient use of our production
systems. This setup allows us to test our clustering software with less
hardware expenditure and also allows us to give developers, academics, and
senior students safe access to root.
In this article, I’ll describe how to set up a
self-managed Xen cluster, which may or may not be a part of a larger network.
Critically, the Xen cluster will minimize any security holes and be a
well-behaved netizen. Security, authentication, and authorization will be
maintained by Kerberos, ssh, and LDAP, while ebtables will stop any possible
pollution of the external network.
Many Linux distributions now support Xen, among the most
recent are Fedora Core 5, RedHat Linux 5 [2], and Novell. FC5 makes it particularly
easy to set up Xen, and I will focus on that distribution. However, all the
configurations are generic.
Overview
Xen uses the principle of domains in its architecture.
Domain0 is the environment that the physical hardware boots into, and the domain1
nodes start up in the virtual environments. FC5 includes precompiled kernels to
turn a standard install into a Xen domain0 environment. However, patching the
kernel and building the tools from the Xen download site [3] can be fun and
instructive when learning to use Xen.
A suitable machine for a Xen cluster box would typically
include multiple CPUs and 2GB or more of fast RAM (although Xen will run on
less). In a perfect world, the disk space allocated to each VM instance would
be provided by a separate RAID1/5 array attached to its own dedicated
controller to avoid IO bottlenecks in one instance from becoming a problem for
the other Xen instances running on the system. If disk hardware and controllers
are in short supply, though, one large volume group can be built and logical
volumes alone can be used to separate disk allocations. This will require more
monitoring of the disk IO to ensure each VM is not hogging the bus.
Regardless of whether you plan to use a single storage
array or multiple channels, configure a separate LVM2 volume group for each Xen
VM instance. Aside from making it easier to add additional storage capacity in
the future, this will allow you to clone or back up the Xen environments using
LVM snapshots. This will lead to the possibility of using LVM snapshots to
clone VMs, thus speeding up installation of clusters.
"vg0” is the name I’ve given to the volume group for
domain0’s filesystem (/boot is held on a physical partition on the bootable
RAID set).
"vg_yum” is the name I’ve given to the volume group used
to hold the distribution’s repository; the name is arbitrary and was just
chosen as an aide-mémoire.
Volume groups vg_vm1...vg_vmN are the LVMs for each new
Xen instance.
Creating an optimal flexible layout (OFL) for a Xen
cluster is more difficult than setting up a standalone box with just one
environment running on it. This is especially true if you’re planning to let
inexperienced users play on the boxes with escalated privileges, or if you are
building your cluster to cope with disparate uses on the VMs. Databases, Web
servers, and compute boxes can be difficult to manage in their own separate
systems, and they’re much more difficult to manage in a Xen cluster if the sys
admin of each VM is unaware of the load others are putting upon the
architecture. The admins may not have sight of the other VMs’ hardware usage
patterns and could encounter difficulty investigating bottlenecks and stoppages
inside their own VMs caused by contention on the hypervisor’s other VMs. What
is gained by using Xen in terms of flexibility and efficient use of hardware
could easily be lost to complexity if initial plans of usage are poorly
developed or inadequately tested.
The box used for this article was built with a small FC5
install and the following example used the 2.6.17-1.2187_FC5xen0 kernel from
the F5 updates repository.
Domain0 Setup
Setting up domain0’s kernel and OS securely on FC5
requires several steps:
1. Remove any access to the box apart from root on
the console, turn off ssh and any other unnecessary daemons that may be started
at boot time. This is usually best done on a fresh build of FC5; having a
Tripwire-like product installed and configured at this point is also a good
idea. Domain0 needs to be protected at all times.
2. Create an ext3 file-system on a logical volume
(LV) on vg_yum then copy over the yum archive from a trusted site. I use
rsync://ftp.belnet.be; however, using a site close to you that you trust and
for which you’ve previously verified the checksums of the RPMs is a prudent
step, viz:
lvcreate -L50G -n yum_archive vg_yum
mke2fs -cc -j -m 1 /dev/vg_yum/yum_archive
mkdir /var/local_yum
mount -t ext3 /dev/vg_yum/yum_archive /var/local_yum
mkdir -p /var/local_yum/i386/fc5/{updates,core,extras}
for i in updates core extras; do
cd /var/local_yum/i386/fc5/$i;
rsync -va --delete --exclude=/debug \
rsync://ftp.belnet.be/packages/fedora/linux/$i/5/i386/ .
done
The -cc on the mke2fs gives the disk’s controller the
chance to remap any bad blocks, it’s very slow and not required if you’ve
verified the disk’s media previously.
At this point the box need not be on the network. If you
wish to proceed offline, configure at least one Ethernet card to be assigned a
static IP at boot. In this example the domain0 eth0 device is given the IP
address of 192.168.42.18/24 with the default route set to 192.168.42.1. Reboot
the box using either GRUB to LILO to boot the Xen0 kernel.
An http server on domain0 should be able to serve the yum
archive to the VM. DHCP needs to be ready to serve the IP address of the VM and
skeletal forward; reverse config files to map the IP address to DNS must be
present, too. As long as the various servers do the basics, it doesn’t matter
which product is used. In this example, the functionality was provided by:
Apache 2.2.2
ISC DHCP Server 3.0.3
BIND 9.3.2
For the short and simple service needed, these are
probably overkill, but they are well known and their setups are well
documented. DHCP is the most important config and is included in Listing 1.
The Build
Kickstart can be used to automate the builds and an
example kickstart, which will build the first domain, is included in Listing 2.
This allows for the Xen environments to be easily built and rebuilt. This
building strategy requires apache and dhcpd to temporarily run on domain0.
Dhcpd provides the first domain0 environment with an IP address and information
on DNS location, and Apache serves the contents of the yum archive. I call this
method "Xen Master is Yum”. After the first domain1 environment is built, the
apache and dhcp servers can be shutdown on domain0, and the LV holding the yum
archive can be unmounted from domain0 and exported to the first or other VMs.
A bug in the Xen setup causes corruption on the tx
checksums of UDP packets. So, before the first domain1 VM can be built, the tx
checksumming must be turned off on the physical Ethernet device, the virtual
Ethernet device in domain0, and the virtual endpoint of the virtual Ethernet
device, which connects to the bridge:
ethtool -K peth0 tx off
ethtool -K eth0 tx off
ethtool -K vif0.0 tx off
The dhcpd must then be restarted to take advantage of this.
Xen stores its configuration files in /etc/xen. The name
of the file corresponds to the human-readable name of the VM but not
necessarily the DNS name of the environment. It makes sense, however, to
maintain naming consistency. In this case, the first domain1 will have the A
record krb5.example.com and C names yum and ldap, both in example.com’s domain
in accordance with RFC 2606. Having an A record of krb5.example.com on the KDC
means that, after the build, the /etc/hosts file must include the line:
192.168.42.19 krb5.example.com krb5 ldap yum kdc
Otherwise, Kerberos will mistake the domain name during
authentication against the KDC. See Listing 3 for an Xen configuration file
used for building the domain.
The Xen configuration file /etc/xen/krb5 also holds the
MAC address for the virtual Ethernet device to be used by Xen for the krb5 VM.
Xen has been granted its own MAC space, 00:16:3E:xx:xx:xx, which XenSource Inc
permits Xen admins to use [4]. However, if you’re confident of your network MAC
address space, many experts also recommend using a MAC address that starts ‘aa’
[5].
The MAC address used must match the address in
/etc/dhcpd.conf, or the krb5 VM will not be assigned an IP address and the
installation will fail.
Other parts of /etc/xen/krb5 outline the kernel and
initrd to use to boot up the VM. Fedora conveniently provides these as part of
its core install. They can be found on the first CD or in the repository
outlined above. However, they don’t contain many patches added later, so you
might want to unpack a Xen0 RPM from the updates yum repository and use those
instead [6].
Also inside /etc/xen/krb5 is the setting for memory and
CPU usage. The uuid must be unique; Xen uses it to name the domain internally.
The disk line provides a way to export files, devices, and LVs for use in the
VM. Using the settings in Listing 3, Xen will export the LV as a device that
will be interpreted as an unformatted block device by the installation scripts.
Therefore, if looked at raw, it will contain a partition table in the first few
K. This makes it interesting to find grubd, the kernel installed inside the VM.
FC5 neatly sidesteps this issue by providing a small Python script that seeks
to the right location on the LV to find grub. But, this is not needed, and any
script that can seek into the LV and return the boot-loader can be used.
This leaves the kernel management to the sys admin of the
VM and reduces the exposure of domain0. It may leave open the possibility that
a malicious admin in a domain1 could try to force domain0 to attempt the boot
of a carefully crafted kernel, which would allow them either root in domain0 or
enable them to perform a DOS on the cluster. Occam’s Razor would suggest the
latter is the more difficult path to crack.
The "extras” line passes the location of the Kickstart
configuration line. The kickstart file used for the krb5 domain is necessarily
the most complex in a basic cluster install. Apache must be able to serve this
file, but it’s not required to be placed in the yum archive. However, after
domain1 is built, it should no longer be served from domain0 as this represents
another avenue of attack.
If you’re sure your platform is secure, you can be
reasonably confident that the kickstart file will not be intercepted by a man
in the middle attack. The krb5 domain’s kickstart configuration file draws its
RPMs from the archive in domain0 and then builds the KDC and LDAP services.
Finally, it draws a set of updates from domain0’s yum archive, then it shuts
down. It is very difficult to debug this install stage because other terminals
usually available are not present in the Xen system. It’s possible to get some
idea of what’s going on by redirecting any command output from the %post
section to /dev/tty1,
It is essential that Xen interrupt the reboot event in
the VM at the end of the install. A reboot would normally be the case during an
"on iron” installation, but if the VM is not properly shut down and the Xen
config file not changed, the domain will spring back up with the same ‘ks’
option present and an infinite loop created.
With the domain krb5 down, turn off apache and dhcp in
domain0. Then unmount the yum archive’s LV and use the Xen configuration file
in Listing 4 to export it to krb5. Xen does not allow file systems to be shared
between domain0 and VMs. NFS-ing the yum archive from domain0 or exporting it
via apache from the hypervisor just isn’t secure enough, because it may be a
long time between reboots of domain0, and it’ll likely suffer from bit rot. The
fewer things that might force a kernel panic or eat up memory or CPU cycles in
domain0 the better.
With the new Xen config file, domain krb5 may be brought
up. It can run SElinux at this point and should be configured as the KDC and
LDAP server for the cluster. You could keep it providing yum to the rest of the
cluster or you could create a new domain1 VM to hold it. Now is the time,
however, to decide whether you want domain krb5 to control access to domain0.
There is an argument that domain0 should be configured by
use of sudo or similar technology. Certainly if it’s locked down via SElinux
with specific users having specific roles in the management of the Xen cluster
(one for the firewall, one for fixing VM issues, etc.), then these should be
properly managed. The KDC and LDAP in domain krb5 are the proper places for
this, and a supplemental LDAP or KDC should not be installed in domain0 for the
same reason as above.
Because the VM depends on its own LDAP, slapd must be
started before named. In Fedora Core 5, named has a dependency upon LDAP
because of the configuration of /etc/nnswitch.conf; booting into run level 3
will hang unless slapd is brought up before named tries to start.
Once krb5 is running, we can edit /etc/fstab, adding this
line:
/dev/hda1 /var/local_yum ext3 defaults,acl 0 0
This mounts our yum archive inside domain0, allowing us
to build more VMs on the cluster using a domain1 server rather than the more
valuable domain0.
Even though we have a running cluster at this point,
there is still work to do to secure it. Iptables must be properly configured,
and other security measures required by your site still need to be implemented.
An example script that does all this can be found in Listing 5.
DHCP Requests and Bridging
If you’re building the Xen environment on a network it
can, depending upon whether your routers are well managed, leak DHCP and DNS
information. Therefore, it is necessary to stop DNS and DHCP messages from
getting out. This is especially important in a production system if you’re
rebuilding a Xen cluster and you don’t want your DHCP servers to accidentally
serve, or worse reject, DHCP requests from other machines.
In domain0, there are three network devices present that
can help you manage the flow of network traffic:
- peth0-- This is the physical Ethernet
device on the box. Traffic regulation here will affect all traffic physically
entering and leaving the box.
- eth0-- This is a virtual Ethernet
device used to monitor and control access to domain0. Access restrictions here
will not affect traffic flowing from the bridge (see below) and the physical
Ethernet device. Each domain in the Xen cluster has a similar eth0.
- xenbr0-- This is the bridge device
responsible for forwarding the Ethernet packets between the virtual Ethernet
devices belonging to the domains. It may not be called xenbr0 in your setup, as
you can define its name and have more than one on a system. The network line in
the Xen configuration line tells the Xen system which bridge to connect the
virtual Ethernet device to. You can investigate the bridging subsystem with the
brctl command.
You may also see a device vif0.0; this is just the other
end of the virtual Ethernet device, and playing with that is outside the scope
of this article.
To stop any DHCP requests entering the box you can issue
this command:
ebtables -A FORWARD --out-interface peth0 --protocol ipv4
--ip-protocol udp --i\p-destination-port 67:68 -j DROP
ebtables -A FORWARD --in-interface peth0 --protocol ipv4
--ip-protocol udp --ip\-destination-port 67:68 -j DROP
This has two effects-- the first line stops the
DHCP server on domain0 sending out any DHCP information through the physical
NIC. Crucially, it still allows DHCP traffic between the virtual bridged
network devices in the Xen VMs. It will also stop the Xen VMs from accidentally
sending out DHCP traffic. This means you can confidently test DHCP servers on
the Xen cluster without irking your network admins. Or, perhaps more
importantly from a time management point of view, it removes the need to
closely monitor what sys admins of the VMs are doing; you can let an
inexperienced sys admin play with DHCP knowing they can’t damage the network.
The second line stops your DHCP server on domain0 from
receiving DHCP requests from clients outside the cluster. This is more of a
convenience and stops your domain0 log files from filling with attempts of the
DHCP server to answer queries. It also makes the DHCP server invisible on the
wider network, thus reducing the chances of unwanted attention from network
admins.
There may be other broadcast/discovery protocols on the
network, like IPP or SMB, that you may also want to restrict on the Xen
cluster. The ebtables command is very powerful and can fulfill most
requirements.
If you’re experimenting with Xen, it’s a good idea to
restrict all external access via the network to the VM and leave access to
domain0 via SSH open until you’re confident your network admins are happy with
the setup. It’s possible for a badly set up bridge to bring down even large,
well-maintained corporate networks. If, for example, you set up spanning tree
protocol badly, the entire network might end up thinking yours is the best
route to traverse. Network engineers hate this.
Monitoring
When Xen starts, it’ll bind to your terminal. To detach
the domain1 VMs from the terminal issue CTRL-]. You can always reattach later
using xm console krb5 It’s possible to attach each Xen console to a different
terminal in a "screen” session, thus allowing for very flexible continuous
monitoring of the environment.
Other tools like "xentop” give you a top-like display of
the physical resources used by the cluster, including detailed information
about the network cards "N” and CPU usage "V”. But, it’s still a good idea to
run a more in-depth monitoring tool on domain0, such as sa, sar, and vmstat, so
that bottlenecks in IO can be identified. ebtables can also be set to log
traffic, but the increase in monitoring in domain0 must be offset by the need
to keep as little as possible running on it.
Conclusion
Xen is still an emerging technology that has yet to find
widespread use as a hypervisor in the enterprise. However, it is possible to
successfully build stable, scalable, secure Xen clusters with current
distributions of Linux. These builds can be on or off the network and contain
independent authorization, authentication and auditing-- a basic
requirement of any enterprise cluster.
Running a Xen cluster still requires a lot of in-depth
knowledge, but the tools are available to play with it safely in the corporate
and academic environments, and surely this will lead to more sys admins and
technical architects investigating the technology. Xen’s price and versatility
not only makes it very attractive but will also force commercial vendors to
improve their VM products, which can only be good for the consumer. The age of
cheap and easy virtualization on commodity hardware has arrived.
References
1. http://www.cl.cam.ac.uk/Research/SRG/netos/xen/
2. http://www.theregister.co.uk/2006/09/15/redhat_xen_beta/
3. http://www.cl.cam.ac.uk/Research/SRG/netos/xen/downloads.html
4. http://www.cl.cam.ac.uk/Research/SRG/netos/xen/readmes/user/ \
user.html#SECTION02221000000000000000
5. http://wiki.xensource.com/xenwiki/XenNetworking
6. http://www.rpm.org/max-rpm/s1-rpm-miscellania-rpm2cpio.html
Faye Gibbins works with the GeoSciences IT Support Team @
Edinburgh University; she’s also Secretary of Edinburgh’s Linux User Group. She
has 8 years sys admin experience and an Honors Degree in Astrophysics. In her
spare time, she brews mead, grows chilis, and manufactures corset bones and
busks. She can be contacted at: fgibbins@staffmail.ed.ac.uk.
|