MySQL 5.0 Cluster: Architecture, Implementation, and Management

Norm Collins

Today's networks and applications are concentrating more on high availability and redundancy. Corporations are selling services to customers with a "guarantee", or "service-level agreement" (SLA), that provides an overall percentage of uptime along with detailed instructions on the rebate structure if certain conditions are not met. One of the key components in the delivery of information to customers is the availability of data from a robust and effective database management system (DBMS).

MySQL is one of the main database management systems that corporations rely on to store their data. The release of MySQL 5.0 has brought many rich features that make it one of the leading database management systems available. Some of the new features include stored procedures, triggers, and views. Other features that have been previously available are replication and clustering. All of these features make MySQL what it is today. I will focus on one feature in particular -- the MySQL clustering technology.

In this article, I will discuss the benefits of using a MySQL cluster to corporations that require highly available data, the implementation of basic MySQL cluster using the minimum recommended requirements, and how to enhance the initial cluster to allow for more availability. I will also describe some cluster configuration options and some typical MySQL cluster management client commands for managing a cluster.

Benefits

Implementing a MySQL cluster provides many benefits for critical applications:

1. High Availability -- A MySQL cluster provides a higher level of availability because of the architecture of the system. The MySQL cluster is a layered implementation in which servers are specifically allocated as SQL nodes where applications connect and perform queries. Other servers are allocated as data nodes where the data is stored across all the nodes in their memory.

2. Low Cost -- MySQL clusters can be a low-cost solution as they will run on older, commodity hardware instead of requiring the latest and greatest SunFire V440s. (Remember that four systems are required in order to implement a basic cluster.)

3. In-Memory Database -- The biggest requirement for implementing a MySQL cluster is the need for memory. A MySQL cluster runs as an in-memory database where all the data is stored across all the data nodes memory. The result for storing the data in memory is to provide greater speed when accessing the data from the application layer.

MySQL clusters are suitable for any organization needing to implement critical applications requiring high availability at low cost. The fact that MySQL is an open source product with commercial support that can be purchased from MySQL AB is also a great benefit for implementing MySQL in any corporation.

Implementing a Cluster

Before I get into the details of implementing a MySQL cluster, I will explain a few concepts so you will understand the terminology of the implementation.

NDB -- This is the acronym for Network Database.
Node -- Each part of the cluster is referred to as a node. In context, nodes typically denote a physical computer; however, a node could also mean a process.
Management Node -- This node is used to manage all other nodes in a cluster. Some of the typical management functions that are performed from this node include backups, and starting and stopping services.
SQL Node -- The role of this node is to access the cluster data.
Data Node -- This type of node is used to store cluster data.

To implement a basic cluster, we will use four nodes, where each node is a separate computer system. Table 1 provides the details for each node that we are configuring.

Figure 1 shows the basic implementation. When implementing a cluster, each of the hosts should be on the same subnet for security and efficiency. The cluster also needs to be configured on a Fast Ethernet, or Gigabit Ethernet network for support.

To take advantage of the clustering technology, the MySQL-max binary must be installed on each SQL and data node in the cluster. The MySQL-max binary is not required on the management node, but you do have to install the management server daemon and client binaries ndb_mgmd and ndb_mgm.

Storage and SQL Node Installation

On a Red Hat system, the commands to install MySQL-max are as follows:

groupadd mysql
useradd -g mysql mysql
cd /var/tmp
tar -xzvf -C /usr/local/bin mysql-max-5.0.16-pc-linux-gnu-i686.tar.gz
ln -s /usr/local/bin/mysql-max-5.0.16-pc-linux-gnu-i686 mysql
cd mysql
scripts/mysql_install_db --user=mysql
chown -R root .
chown -R mysql data
chgrp -R mysql .
cp support-files/mysql.server /etc/rc.d/init.d/
chmod +x /etc/rc.d/init.d/mysql.server
chkconfig --add mysql.server

Management Node Installation

MySQL server is not required to be installed on the management node, but you need to extract the ndb_mgm and ndb_mgmd files and place them in the /usr/local/bin directory as follows:

cd /var/tmp
tar -zxvf mysql-max-5.0.16-pc-linux-gnu-i686.tar.gz /usr/local/bin
'*/bin/ndb_mgm*'

Make the files executable:

cd /usr/local/bin
chmod +x ndb_mgm*
Initial Management Host Startup

The following command must be executed to initially start up the Management Host:

ndb_mgmd -f /var/lib/mysql-cluster/config.ini

Initial Data Node Startup

The command required to execute is as follows:

ndbd --initial

Initial SQL Node Startup

To start the SQL nodes, just execute the mysql.server startup script in the /etc/init.d/ directory. The one change that is required in the startup script is to change the MySQL binary to the MySQL-max binary.

Configuration Options

To configure each of the data and SQL nodes, a my.cnf must be configured on each of the three systems. On each of the systems, you need to edit the /etc/my.cnf file so that each of the files includes the following configuration:

# Options for mysqld process:
[MYSQLD]                        
ndbcluster                     # run NDB engine
ndb-connectstring=10.22.1.230  # location of MGM node

# Options for ndbd process:
[MYSQL_CLUSTER]                 
ndb-connectstring=10.22.1.230  # location of MGM node

To configure the management node, begin by creating a directory where the config.ini file is to be located:

mkdir /var/lib/mysql-cluster
cd /var/lib/mysql-cluster/config.ini
vi config.ini

For this particular setup, the config.ini should contain the following information:

# Options affecting ndbd processes on all data nodes:
[NDBD DEFAULT]    
NoOfReplicas=1    # Number of replicas
DataMemory=80M    # How much memory to allocate for data storage
IndexMemory=18M   # How much memory to allocate for index storage
                  # For DataMemory and IndexMemory, we have used 
                  # the default values. 

# TCP/IP options:
[TCP DEFAULT]     
portnumber=2202   # This the default; however, you can use any
                  # port that is free for all the hosts in cluster
                  

# Management process options:
[NDB_MGMD]                      
hostname=10.22.1.230           # Hostname or IP address of MGM node
datadir=/var/lib/mysql-cluster # Directory for MGM node logfiles

# Options for data node "A":
[NDBD]                          
                               # (one [NDBD] section per data node)
hostname=10.22.1.220           # Hostname or IP address
datadir=/usr/local/mysql/data  # Directory for this data node's
datafiles

# Options for data node "B":
[NDBD]                          
hostname=10.22.1.221           # Hostname or IP address
datadir=/usr/local/mysql/data  # Directory for this data node's
datafiles

# SQL node options:
[MYSQLD]                        
hostname=10.22.1.210           # Hostname or IP address

Increasing the Size of a MySQL Cluster Implementation

When increasing the size of a cluster, you must consider the database size that will be stored on the cluster, the amount of memory required on the data nodes, the number of application layer systems requiring access to the cluster, and the number of fragments and replicas required in the cluster before knowing the requirements of the cluster. Once you know the size of the database, you can calculate the amount of memory needed based on the following calculation outlined in the MySQL Reference Manual:

(SizeofDatabase * NumberOfReplicas * 1.1 ) / NumberOfDataNodes

where;
SizeofDatabase = the database size
NumberOfReplicas = the number of replicas that are required in the
cluster.  Only 1 replica is required in the majority of MySQL cluster
environments
NumberOfDataNodes = the number of data nodes being planned for in a
cluster. One data is also known as one fragment in a cluster.

To find out how much memory is required, we can calculate this formula based on some initial data. Theoretically, if the size of the database is 1 GB, the number of replicas is 1, and the number of data nodes is 4, then the amount of memory required is:

Amount of memory required = (1GB * 1 * 1.1) / 4
Amount of memory required = 275MB per data node

As the number of application servers and the location of those servers increase, there will be a requirement to increase the number of SQL nodes. The increase in SQL nodes can be grouped by application or by geographical location.

The data nodes will need to be upgraded as the database size increases or the processing of the requests will need to be done more quickly than can be done from the current system. In this case, we would replace the current hardware with more robust hardware.

In Figure 2, you can see a more detailed network diagram that outlines a larger MySQL Cluster and how it would be implemented within a corporate network.

MySQL Cluster Management Client Commands

The commands outlined below need to be executed from the management node in the cluster.

Safe shutdown and restart:

Shutdown -- ndb_mgm -e shutdown
Restart Management Node -- ndb_mgmd -f /var/lib/mysql-cluster/config.ini
Restart Data Node -- ndbd
Restart SQL Node -- mysqld &

Once all of the cluster nodes have been started, the following commands can be used:

SHOW ENGINES\G -- This will show the supported engine currently set up on the server.
Ndb_mgm -- This binary is the management client binary and, once it is invoked, other commands can be executed to verify the status of the cluster.
SHOW -- The execution of this command will provide a report of the cluster status from the management station.
Node_id STOP -- Stops the data node specified by the node_id.
Node_id RESTART -- Restarts the data node identified by the node_id.
Node_id STATUS -- Displays status information on a specific node.
ENTER SINGLE USER MODE node_id -- This command enters the specified node_id into single-user mode.
Exit SINGLE User Mode -- Exits single user mode and allows all applications access.
QUIT -- Shuts down the management client.
SHUTDOWN -- Shuts down all cluster nodes except for the SQL nodes.

Conclusion

In today's networks, customers demand high-availability applications and data storage solutions as a basic requirement. Implementing MySQL clustering technology will bring your corporation one step closer to having a fully redundant architecture for critical applications.

Norm Collins has a B.Sc. in Computer Information Systems and is a technical Specialist at MTS Allstream Inc., which is a technical solutions provider for small businesses to large enterprises. He is responsible for developing network-based technology services utilizing Solaris and Cisco platforms. In his spare time, he also develops solutions on BSD and Linux platforms. Norm can be reached at: normc@aci.on.ca.