MySQL 5 Cluster with SolarisTM 10 Zones/ZFS/Resource Control
Derek Crudgington
Solaris 10 introduced several pieces of technology
that can benefit systems administrators, particularly when used together.
Zones can provide application isolation and assist in server consolidation
efforts. Combining the Fair Share Scheduler (FSS) with zones gives the
administrator control over the amount of system resources allocated to each
zone. Running zones on top of the ZFS file system allows each zone to be
set up in a matter of minutes and can also provide other benefits such as
compression, snapshots, and self healing.
On this configuration, I use Solaris Express Community
Release build 45, which can be downloaded from:
http://www.opensolaris.org
Solaris Express Community Release is the post Solaris
10 code base called Nevada, which will eventually become Solaris 11.
I'll combine these features with MySQL 5 to show how admins can
benefit from data replication, redundancy, and isolating applications down
to each zone with only resources allowed.
All zones will be installed in /zones. Setup will be as follows:
mysqlmanage zone - 192.168.0.10
mysqldata1 zone - 192.168.0.11
mysqldata2 zone - 192.168.0.12
First, let's set up our interface in the global zone:
# ifconfig bcme0 plumb
# ifconfig bcme0 192.168.0.2 up
Zone Configuration
We will begin with creating and setting up the zones.
We will loopback mount the /opt directory (using add inherit-pkg-dir) on all zones so we
aren't copying a load of packages over to each zone. This also allows
us to install MySQL in the global zone in /opt and access it from all zones
so we don't have to install MySQL three times. To begin, create the
zone that will be working as the MySQL management node; it will be called
mysqlmanage:
# zonecfg -z mysqlmanage
mysqlmanage: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:mysqlmanage> create
zonecfg:mysqlmanage> set zonepath=/zones/mysqlmanage
zonecfg:mysqlmanage> add inherit-pkg-dir
zonecfg:mysqlmanage:inherit-pkg-dir> set dir=/opt
zonecfg:mysqlmanage:inherit-pkg-dir> end
zonecfg:mysqlmanage> add net
zonecfg:mysqlmanage:net> set address=192.168.0.10
zonecfg:mysqlmanage:net> set physical=bcme0
zonecfg:mysqlmanage:net> end
zonecfg:mysqlmanage> info
zonecfg:mysqlmanage> verify
zonecfg:mysqlmanage> commit
zonecfg:mysqlmanage> exit
Create the ZFS zone file system and set the mount
point to be /zones. (Note: ZFS is the ZettaByte file system, which is an
extremely powerful storage file system that provides features like
unlimited snapshots, RAIDZ, compression, self healing, and more without the
traditional storage complexity. It was released in Solaris 10 Update 2.):
# zfs create mypool/zones
# zfs set mountpoint=/zones mypool/zones
Install the mysqlmanage zone:
# zoneadm -z mysqlmanage install
A ZFS file system has been created for this zone.
Preparing to install zone <mysqlmanage>.
Now we will create the mysqldata1 and mysqldata2
zones. Since they are going to be configured the same, we will clone the
mysqlmanage zone for these, which is a lot quicker than installing these
zones separately. Cloning is a feature introduced in Solaris Express
Community Release b33 or higher and copies all
configurations, packages, etc. from the source zone to the target zone. The source zone must be halted when doing this
procedure. To show how fast cloning works, we will time it.
Configure mysqldata1 and mysqldata2 nodes:
# zonecfg -z mysqldata1
mysqldata1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:mysqldata1> create
zonecfg:mysqldata1> set zonepath=/zones/mysqldata1
zonecfg:mysqldata1> add inherit-pkg-dir
zonecfg:mysqldata1:inherit-pkg-dir> set dir=/opt
zonecfg:mysqldata1:inherit-pkg-dir> end
zonecfg:mysqldata1> add net
zonecfg:mysqldata1:net> set physical=bcme0
zonecfg:mysqldata1:net> set address=192.168.0.11
zonecfg:mysqldata1:net> end
zonecfg:mysqldata1> verify
zonecfg:mysqldata1> commit
zonecfg:mysqldata1> exit
# zonecfg -z mysqldata2
mysqldata2: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:mysqldata2> create
zonecfg:mysqldata2> set zonepath=/zones/mysqldata2
zonecfg:mysqldata2> add inherit-pkg-dir
zonecfg:mysqldata2:inherit-pkg-dir> set dir=/opt
zonecfg:mysqldata2:inherit-pkg-dir> end
zonecfg:mysqldata2> add net
zonecfg:mysqldata2:net> set physical=bcme0
zonecfg:mysqldata2:net> set address=192.168.0.12
zonecfg:mysqldata2:net> end
zonecfg:mysqldata2> verify
zonecfg:mysqldata2> commit
zonecfg:mysqldata2> exit
Clone mysqlmanage to be mysqldata1:
# time zoneadm -z mysqldata1 clone -m copy mysqlmanage
A ZFS file system has been created for this zone.
Copying /zones/mysqlmanage...
real 0m19.813s
user 0m0.179s
sys 0m2.198s
And the same for mysqldata2:
# time zoneadm -z mysqldata2 clone -m copy mysqlmanage
A ZFS file system has been created for this zone.
Copying /zones/mysqlmanage...
real 0m18.691s
user 0m0.176s
sys 0m2.162s
Now /zones/mysqlmanage, /zones/mysqldata1, and /zones/mysqldata2 are set up and ready to go.
Resource Configuration
The next step is to configure the resource limitations
for each zone. There are a few different ways you can divide up resources:
fixed CPU, varying CPU, and FSS. We will use the FSS (Fair Share Scheduler)
here because we only have one CPU and we can assign a number of shares, or
parts of the CPU, to each zone.
In the global zone, enable the pools and create the
default configuration file:
# pooladm -e && pooladm -s
To make sure it has been enabled, check for the poold
process or pools:default in SMF:
# svcs -a | grep pool
disabled 11:53:54 svc:/system/pools/dynamic:default
online 13:22:56 svc:/system/pools:default
Create the resource pools, pool1 for management node, pool2 and pool3 for data nodes. Pool4 will be the global zone:
# poolcfg -c 'create pool pool1 ( string pool.scheduler = "FSS" )'
# poolcfg -c 'create pool pool2 ( string pool.scheduler = "FSS" )'
# poolcfg -c 'create pool pool3 ( string pool.scheduler = "FSS" )'
# poolcfg -c 'create pool pool4'
When you type "pooladm", the pools
won't show up in the configuration; type "pooladm -c" to
create the configuration file /etc/pooladm.conf:
# pooladm -c
Now we can type "pooladm" and see the configuration in place:
# pooladm
[output stripped out..]
This output shows the default pool configuration. You
don't need to understand it -- just read over it. We then need
to tell the system default scheduling to use FSS, and move all processes to
use FSS:
# dispadmin -d FSS
# priocntl -s -c FSS -i all
When you run ps -efc, you should see all processes listed as "FSS" class.
Next we will configure each zone to use the pools we
created earlier, and give these pools a certain number of shares of the
CPU. Here's how we will divide up the CPU:
mysqlmanage - pool1 - 10 shares (20% CPU)
mysqldata1 - pool2 - 20 shares (40% CPU)
mysqldata2 - pool3 - 20 shares (40% CPU)
It is important to understand here that zone resource
limitations will only take effect if the zones are all consuming to
capacity. For instance, if mysqlmanage, mysqldata1, and mysqldata2 are all
resource hungry, they will only go to their limits, which are 20%, 40%, and
40%. If mysqldata1 is the only node that needs resources and the others are
idle, it will consume 100% of the CPU.
Starting with mysqlmanage zone, because this process
few resources, we will only give this zone 10 shares, which is equal to 20%
of the CPU:
# zonecfg -z mysqlmanage
zonecfg:mysqlmanage> set pool=pool1
zonecfg:mysqlmanage> add rctl
zonecfg:mysqlmanage:rctl> set name=zone.cpu-shares
zonecfg:mysqlmanage:rctl> add value (priv=privileged,limit=10,action=none)
zonecfg:mysqlmanage:rctl> end
zonecfg:mysqlmanage> verify
zonecfg:mysqlmanage> exit
mysqldata1 and mysqldata2 zones will each be allocated 40% of the CPU:
# zonecfg -z mysqldata1
zonecfg:mysqldata1> set pool=pool2
zonecfg:mysqldata1> add rctl
zonecfg:mysqldata1:rctl> set name=zone.cpu-shares
zonecfg:mysqldata1:rctl> add value (priv=privileged,limit=20,action=none)
zonecfg:mysqldata1:rctl> end
zonecfg:mysqldata1> verify
zonecfg:mysqldata1> exit
# zonecfg -z mysqldata2
zonecfg:mysqldata2> set pool=pool3
zonecfg:mysqldata2> add rctl
zonecfg:mysqldata2:rctl> set name=zone.cpu-shares
zonecfg:mysqldata2:rctl> add value (priv=privileged,limit=20,action=none)
zonecfg:mysqldata2:rctl> end
zonecfg:mysqldata2> verify
zonecfg:mysqldata2> exit
<test>
Testing Resources
Boot the zones one at a time and set them up with proper configurations:
# zoneadm -z mysqlmanage boot
# zlogin -C mysqlmanage
[Connected to zone 'mysqlmanage' console]
Each one will be prompted to configure through the
initial set up. Repeat these steps for each zone, setting up proper local
and TCP/IP settings: mysqlmanage 192.168.0.10, mysqldata1 192.168.0.11,
mysqldata2 192.168.0.12.
Once these zones are booted and running, it's a
good idea to go into each of them and shut down services that are not
needed. Use svcadm and check /etc/rcX.d directories. Also make sure you can ping each IP from a zone.
Before we start setting up the MySQL cluster, we must
verify that the resource limitations are working. We will test the resource
limitations by using cpuhog.pl in Listing 1.
Put the script in /opt so that all of the zones can
read it. Remember /opt is loopback mounted read-only in each zone. Use zlogin and log in to each
zone and execute /opt/cpuhog.pl.
In the global zone, run prstat -Z:
ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE
2 34 100M 51M 4.9% 0:00:37 39% mysqldata1
3 34 100M 51M 5.0% 0:00:22 38% mysqldata2
1 34 100M 51M 4.9% 0:00:23 17% mysqlmanage
0 78 1017M 330M 32% 0:00:58 1.6% global
The CPU column shows that zones are each hovering
where they should be -- 40% for both data zones, and 20% for the
management zone. If we killed off the mysqlmanage cpuhog.pl, we would see
mysqldata1 and mysqldata2 both go to 50% to share the remaining available
resources of the CPU.
However, the global zone doesn't have any
resources defined to it. By default, the global zone only gets one share;
we will give it more so that if something bizarre happens the global zone
will still have resources to use.
Let's view the shares the global zone has now:
# prctl -n zone.cpu-shares -i zone global
zone: 0: global
NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT
zone.cpu-shares
privileged 1 - none -
system 65.5K max none -
Only 1 shown here, we will give it 10 (20% of the CPU):
# prctl -n zone.cpu-shares -v 10 -r -i zone global
# prctl -n zone.cpu-shares -i zone global
zone: 0: global
NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT
zone.cpu-shares
privileged 10 - none -
system 65.5K max none -
The global zone has enough resources now and we are
ready to set up MySQL cluster. To verify, we test with cpuhog.pl running in
all zones. mysqldata1 and 2 should have around 33%; mysqlmanage and global
should have around 16%:
ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE
6 35 126M 62M 3.1% 0:00:51 29% mysqldata2
4 35 126M 60M 3.0% 0:00:51 29% mysqldata1
2 35 126M 57M 2.8% 0:00:46 19% mysqlmanage
0 98 1426M 558M 28% 1:34:16 18% global
MySQL Cluster Configuration
Download the latest production release MySQL from
mysql.com and get the MAX package because it includes the clustering that
the standard package does not.
Add the mysql group and user to the global zone. We
won't run mysql in the global zone but it must be added because the
mysql package requires it before installing:
# groupadd mysql && useradd -g mysql mysql
Add the MySQL package; -G is for global zone only because we don't want it pkgadding the package to every
zone, which will take a long time and /opt is loopback mounted in each zone
so each zone can use the package:
# gzip -d mysql-max-5.0.22-solaris10-x86_64.pkg.gz
# pkgadd -G -d mysql-max-5.0.22-solaris10-x86_64.pkg
# chgrp -R mysql/mysql/*
To have a true redundant MySQL cluster set up, you
must have at least three nodes. If you try to use two nodes, you will run
into the split-brain scenario where if there is only one node remaining up,
the cluster portion (ndbd) will shut itself down. The mysqld will still be
running but any data added will not be replicated. A common scenario is to
run two data nodes and one management node and this is what we do here. The
data nodes run the ndbd process and hold all of the database information
that is replicated between other data nodes. The management server runs the
ndb_mgmd process, which provides the management interface and keeps the
cluster in sync.
To set up the mysqlmanage zone:
# zlogin mysqlmanage
Add the mysql group and user:
# groupadd mysql && useradd -g mysql mysql
The mysqlmanagement node only needs to use the ndb_mgm
(client) and ndb_mgmd (server). We'll start with the configuration
file for the ndb_mgmd and data nodes.
Create a directory to use for the config.ini:
# mkdir /var/lib/mysql-cluster
Insert the following into /var/lib/mysql-cluster/config.ini, which should look like Listing 2. The
comments in Listing 2 explain what each configuration does.
If you work with larger databases, I recommend upping
the values of MaxNoOfOrderedIndexes, MaxNoOfAttributes, MaxNoOfUniqueHashIndexes, DataMemory, and IndexMemory in the mysql management config.ini.
This may save time if you are running into errors importing or using larger databases.
Start the MySQL management ndb_mgmd daemon. It already
knows to look in /var/lib/mysql-cluster for the config.ini. If you have the
config.ini in a different directory, use ndb_mgmd -f <location to config.ini>:
# /opt/mysql/mysql/bin/ndb_mgmd
# ps -ef | grep mgmd
root 3595 3320 0 12:42:31 pts/9 0:00 grep mgmd
root 3593 1215 0 12:42:29 ? 0:00 /opt/mysql/ \
mysql/bin/ndb_mgmd
Log in with the MySQL management client ndb_mgm and
make sure everything looks okay:
# /opt/mysql/mysql/bin/ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 (not connected, accepting connect from 192.168.0.11)
id=3 (not connected, accepting connect from 192.168.0.12)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @192.168.0.10 (Version: 5.0.22)
[mysqld(API)] 2 node(s)
id=4 (not connected, accepting connect from any host)
id=5 (not connected, accepting connect from any host)
--------------------------------------
We can see here it is waiting for connections from
mysqldata1 and mysqldata2. Let's set up the data nodes and repeat the
following on mysqldata1 and mysqldata2:
# groupadd mysql && useradd -g mysql mysql
# cd /opt/mysql/mysql/
# scripts/mysql_install_db --user=mysql
In each data node's /etc/my.cnf put:
[mysqld]
ndbcluster
# IP address of mysqlmanagement server
ndb-connectstring=192.168.0.10
[mysql_cluster]
# IP address of mysqlmanagement server
ndb-connectstring=192.168.0.10
This tells our data nodes who the management server is.
Next, we will make the data directory and start the
data node. You must use --initial when starting the data node for the first time or when
you've made configuration changes:
# mkdir /var/lib/mysql-cluster
# cd /var/lib/mysql-cluster/
# /opt/mysql/mysql/bin/ndbd --initial
# /opt/mysql/mysql/support-files/mysql.server start
After performing this on both mysqldata1 and
mysqldata2, go back to mysqlmanage with the MySQL management client running
and type "show":
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @192.168.0.11 (Version: 5.0.22, Nodegroup: 0, Master)
id=3 @192.168.0.12 (Version: 5.0.22, Nodegroup: 0)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @192.168.0.10 (Version: 5.0.22)
[mysqld(API)] 2 node(s)
id=4 @192.168.0.11 (Version: 5.0.22)
id=5 @192.168.0.12 (Version: 5.0.22)
Here they have fully joined. If they still show
"starting", then you must wait until they fully join the
cluster before adding data to the cluster.
Test the Cluster
We need to test the cluster to make sure it's
working properly. On mysqldata1 and mysqldata2, log into mysql and create
the cluster database on each:
# /opt/mysql/mysql/bin/mysql -u root
mysql> create database cluster;
Query OK, 1 row affected (0.00 sec)
On mysqldata2, create test1 table with the NDBCLUSTER engine:
mysql> create table test1 ( id int(11) primary key not null, \
name varchar(300) not null ) ENGINE=NDBCLUSTER;
Query OK, 0 rows affected (1.27 sec)
Go back to mysqldata1 and see if the table showed up:
mysql> use cluster;
Database changed
mysql> show tables;
+-------------------+
| Tables_in_cluster |
+-------------------+
| test1 |
+-------------------+
1 row in set (0.01 sec)
Now we can test the failover on the cluster. Kill off mysqldata1's ndbd:
# pkill ndbd
Go to mysqldata2 and try a select on the cluster table:
mysql> select * from cluster;
Empty set (0.13 sec)
Here we can see it is still returning results on
queries, which means it is working. If you do a show on the MySQL
management it will show the node down.
Conclusion
While this type of configuration will save you from a
database failure, it will not save you from a hardware failure because of
it being located on one physical machine. This could also easily be
implemented for any other clustering solutions such as Tomcat or Weblogic, and you could even go further by combining this with the
Service Management Facility. With the features
of Solaris 10, the possibilities are almost endless. For more information
on zones and resource management, see:
http://www.sun.com/blueprints/browsesubject.html#resource
Derek Crudgington is a contractor for The InSource
Group and a Solaris addict. You can check out
his blog at http://hell.jedicoder.net or email him at dacrud@gmail.com.
|