An Online Backup Solution Using Advanced Features on IBM DS8000
Khurram Shiraz
Design and implementation of a fool-proof backup strategy has been an important topic for companies over the years. With the growth of data in recent years, many organizations now require backup solutions that allow them to have their services online and available to users with minimal performance impact during the backup window. Historically, database administrators have relied on online backup tools and techniques provided by their databases. For example, Oracle database has supported online or hot backup strategies using traditional "begin backup" and "end backup" statements for many years. Now RMAN is also available, and it can be integrated with any backup software, like TSM or NetBackup, to provide online backups solutions for Oracle databases.
A problem arises, however, when the size of the database is very large (multiple terabytes). In that case, time requirements for putting databases in online mode become a problem. In fact, as long as databases are in online backup mode, the database effectively remains in readonly mode to end users. So, for most organizations, it is desirable to make this time period as small as possible. This is where the latest snapshot technologies come in. These snapshot technologies are becoming popular for data protection, data mining, and data replication purposes.
Nearly all hardware and software vendors are now providing snapshot tools. The use of snapshot technology for data protection purposes offers important business values that include zero impact backup with minimal or no application downtime, frequent backups (for example, hourly) to reduce recovery time, instant recovery from snapshot, and most importantly efficient backup solutions for organizations with terabytes of data volumes.
We can categorize snapshot tools mainly as hardware-based tools or software-based tools. Hardware-based snapshot tools are usually designed to work at the storage hardware controller's level. Examples of these tools are FlashCopy from IBM and True Copy from Hitachi. Software-based snapshot tools include AIX JFS2 snapshot and GPFS snapshot tools from IBM and logical volume manager/ file system snapshot tools from Veritas. Hardware-based snapshot tools are much more attractive for IT architects than software-based tools, because these create minimal performance impacts on the host systems involved.
Nearly all high-end IBM storage subsystems provide a snapshot tool. In IBM terminology, this tool is commonly known as "FlashCopy", which is available as a separate, licensed feature for IBM DS4000, DS6000, and DS8000 storage subsystems. This snapshot technique in fact creates a "copy on write" image of the source data on a block-by-block basis with no performance impact on the server itself. Normal FlashCopy operations supported by IBM DS6000 or DS8000 storage subsystems usually take no longer than few seconds to make a copy of a source database that is terabytes in size. Anyone using this FlashCopy technique as an integrated part of their online backup solution is therefore left with only the task of making this snapshot available on the operating system (so as to be taken to tape cartridges, etc.) in the shortest possible period of time.
In this article, I will cover different aspects of the IBM DS8000 FlashCopy feature along with its implementation and integration to make a comprehensive and fully automated online backup solution for a very large Oracle database (~1.2TB).
Note that although the FlashCopy feature provided by every IBM storage subsystems series is technically the same, its implementation may vary from series to series. This is because IBM uses different user interfaces to manage storage subsystems series differently. For example, DS6000 and DS8000 are managed by the DS storage manager running on Windows and Storage HMC platforms (Linux), respectively. DS4000, however, is managed by FAST storage manager software, which can be installed on variety of operating systems including AIX and Windows. Similarly, CLI (command-line interface tool) for DS6000/DS8000 has many different commands as compared to the CLI used for DS4000 series. In this article, therefore, I will concentrate on developing an online automated backup solution using DS CLI for DS8000 storage subsystem.
Advanced Copy Services from IBM
The DS8000 series advanced copy services are powerful data backup, remote mirroring, and recovery functions that can help protect data from unforeseen events. Copy services run on the IBM Total Storage DS series and are designed to support a wide range of servers including IBM pSeries, iSeries, and zSeries environments. Comparable copy services functions are also available on the IBM Total Storage Enterprise Storage Server (ESS) Models 800 and 750 as well as on DS6000 series.
Copy services include the following types of functions:
- IBM TotalStorage Flashcopy, a point-in-time copy function
- Remote mirror and copy functions include:
- IBM TotalStorage Metro Mirror (previously known as Synchronous PPRC)
- IBM TotalStorage Global Mirror (previously known as Asynchronous PPRC)
You can manage copy services functions through the DS8000 series' CLI, as well as the GUI-based interface provided by the IBM Total Storage DS Storage Manager, which is available on S-HMC (Linux-based servers supplied with DS8000 for storage management).
How FlashCopy Works
It is important to understand how this hardware-based tool works so that you can choose the right mode to meet your business environment needs.
The FlashCopy feature is designed to provide the ability to create full volume copies of data at the storage hardware level. It comes with three different options: NOCOPY, COPY, and Incremental FlashCopy.
Now, when you set up a FlashCopy operation, a relationship is always established between source and target volumes, and a bitmap of the source volume is created. Once this relationship and a bitmap are created, the target volume can be accessed as though all the data had been physically copied. With the existence of this relationship between the source and target volumes, storage controllers can start the actual copy process on a block-by-block basis, copying data from the source to the target volume. This copy process works in different ways depending upon which mode (or option) has been selected by the user while establishing FlashCopy relationships.
For instance, if a FlashCopy relationship has been established with the NOCOPY option, only the copy-on-write technique is actually used. In this case, when the snapshot is first created, only the meta-data information about where original data is stored is copied to the target volume. No physical copy of the source data is done at the time the snapshot is created. Consequently, the creation of the snapshot is almost instantaneous. The snapshot copy then tracks the changing blocks on the original volume as writes to the original volume are performed. The original data that is being written to is copied into the designated storage pool that is set aside for the snapshot before original data is overwritten, hence the name "copy-on-write".
Before a write is allowed to a block, copy-on-write moves the original data block to the snapshot storage. This keeps the snapshot data consistent with the exact time the snapshot was taken. Read requests to the snapshot volume of the unchanged data blocks are redirected to the original volume, while read requests to data blocks that have been changed are directed to the "copied" blocks in the snapshot. The snapshot contains the meta-data describing the data blocks that have changed since the snapshot was first created. Note that original data blocks are copied only once into the snapshot storage when the first write request is received.
It is worth noting that the copy-on-write technique may have slight performance impacts. This is because write requests to the original volume must wait while the original data is being "copied out" to the snapshot. The read requests to the snapshot are satisfied from the original source volumes if data being read hasn't changed, otherwise these go to the snapshot target volumes.
If, however, the COPY option is chosen for the FlashCopy operation, it combines benefits of split mirror and copy-on-write techniques. In this case, copy-on-write is used to create an instant snapshot as the FlashCopy pair is established. Optionally, a background copy process is also started to perform a block-level copy of the data from the original volume (source volume) to the snapshot storage (target volume) .
Hence, the IBM FlashCopy tool appears to provide an instant point-in-time flash of the LUNs present on the DS Storage subsystems. This point-in-time flash in fact can contain a consistent snapshot of the original source data (taken at a specific point in time), if necessary measures have been taken at the operating system and database level to make this flash as a consistent flash of data. This is very important because this snapshot has been taken at the hardware level and application and/or database has no knowledge that a snapshot process is in progress. So, success of any backup solution using the IBM FlashCopy technique (and in general any storage or hardware snapshot technique) depends upon data consistency measures taken during the actual snapshot operation.
Activating FlashCopy on DS8000 Storage Subsystems
Because FlashCopy is a premium feature, it requires a separate license that can be bought along with DS storage subsystem or can be ordered as an upgrade (also called MES in IBM terminology) for existing DS storage subsystems. For DS6000 and DS8000 storage subsystems, it is mandatory to activate the license activation codes (or at least the Operating Environment License code (OEL)). This can be done through DS SMC or through the DS CLI console. Other advanced features, like FlashCopy (or PPRC), can be activated after activation of the OEL. To activate the Flashcopy feature for DS8000, you must first gather the following information:
1. The machine signature for DS8000. This is the most important information needed to activate your FlashCopy feature. The machine signature can be easily found using the following DS CLI commands:
dscli> lssi
dscli> Date/Time: March 30, 2005 6:53:05 PM CEST IBM DSCLI \
Version: 5.0.1.99
Name ID Storage Unit Model WWNN State ESSNet
=============================================================
- IBM.2107-7520431 IBM.2107-7520430 922 5005076303FFC19D \
Online Enabled
dscli> showsi IBM.2107-7520431
Date/Time: March 30, 2005 6:53:11 PM CEST IBM DSCLI \
Version: 5.0.1.99 DS: IBM.2107-7520431
Name -
desc -
ID IBM.2107-7520431
Storage Unit IBM.2107-7520430
Model 922
WWNN 5005076303FFC19D
Signature 896e-c0a3-38e9-5702
State Online
2. Machine serial number. The serial number of the DS8000 can be taken from the front of the base frame (lower right corner). On the DS command-line interface, you can also use the lssu command for this purpose.
3. Order confirmation code (OCC). The order confirmation code is printed on the DS8000 series order confirmation code document, which is usually sent to the client's contact person together with the delivery of the machine.
After noting the machine's serial number, signature, and OCC, you can access the following IBM site to generate activation codes for FlashCopy:
https://www-03.ibm.com/storage/dsfa/index.jsp
After inputting all this information for your DS storage, you will be directed to the ViewActivation Codes window, where you can download or highlight, then flash and paste (or simply write down) your activation codes. If you select Download now, you will be prompted to select a file location. The file you download will be a very small XML file. I opted to write the activation codes in a small notebook; no doubt it is a more handy approach!
In our case, the activation code for FlashCopy received from the Web site was 234-1934-J153-10DC-01FC-CA7D-5678-5678, so the next step was simply application of this code. We used the DS CLI option:
dscli> applykey -key 234-1934-J153-10DC-01FC-CA7D-5678-5678
IBM. 2107-7520431
Date/Time: 2 May 2005 14:47:06 IBM DSCLI Version: 5.0.3.5
DS: IBM. 2107-7520431
CMUC00199I applykey: License Machine Code successfully
applied to storage image IBM.2107-7520431
Then, we verified activation of FlashCopy on DS6000 using the lskey command:
dscli> lskey IBM. 2107-7520431
Date/Time: March 30, 2005 6:53:30 PM CEST IBM DSCLI
Version: 5.0.1.99 DS: IBM. 2107-7520431
Activation Key Capacity (TB) Storage Type
================================================
Flashcopy 5 FB
Operating Environment 5 All
Getting Started with DS CLI
DS CLI is a very powerful tool that can be used for managing IBM DS storage subsystems. Because of its interactive nature and also because of its support for scripting mode, it is a very handy tool and can be used easily to automate backup solutions using DS8000 flash services.
We started building our backup solution with the installation of DS CLI. For DS8000, DS CLI supports nearly every major operating system, including AIX 5L and Windows. We selected one of our LPARs on P570 to act as the DS CLI management station. Every DS8000 Storage HMCs has an external Ethernet interface, which is supposed to be attached to the customer network.
We then established a separate VLAN comprising the external net (172.17.20.xx) and assigned 172.17.20.100 and 172.17.20.101 IP addresses to DS8000 Storage HMC's external Ethernet interfaces using the DS manager interface. We assigned IP address 172.17.20.102 to one of the Ethernet interfaces of our AIX LPAR using "smitty chinet" and tested TCPIP connectivity to both storage HMCs. We also created a user in SHMC with admin privileges so that DS CLI commands could be executed using this account.
We then installed DS CLI on our AIX LPAR using root user. You must have a version of Java 1.4.1 or higher installed on your system in a standard directory. The DS CLI installer checks the standard directories to determine whether Java 1.4.1 or higher exists on your system; and if it is not found, the installation will fail. We therefore set our shell environment correctly (correct JAVA_HOME environment variable), mount the DS CLI installation CD, and execute:
setupaix.bin -console
as root user. This will install DS CLI in its default directory for AIX, which is /opt/ibm/dscli.
We then created a dscli profile, which is dscli.profile text file. We mentioned Storage HMC's IP addresses (as hmc1 and hmc2) along with user name and password. The following is the content of the dscli profile used in our scenario:
#DS CLI Profile
#
# Management Console/Node IP Address (es) are specified using
# the hmc parameter
# hmc1 and hmc2 are equivalent to -hmc1 and -hmc2 command line
# options.
# hmc1 is first SHMC for DS8000
hmc1:172.16.5.100
hmc2:172.16.5.101
username: dsadmin
# The password for dsadmin:
password: xxxxxxx
# Default target Storage Image ID
devid: IBM.2107-7520431
We then tested DS CLI functionality from our AIX LPAR as follows:
/opt/ibm/dscli/dscli lsuser
If everything was configured correctly, this command should list all users on SHMC without asking for any password prompt.
Once DS CLI setup is done, there are lot of other things to be done regarding storage configuration on DS8000 (like array sites, arrays, volume groups, host systems, open systems volumes creation, configuration of IO ports topology, etc.). These are beyond the scope of this article, but good details can be found in IBM Red Books (see References).
In our implementation, we created three DS8000 open system volumes (which could hold our Oracle data file systems along with archive log file systems) and assigned these volumes to AIX node (bkkwt) using the volume group concept of DS8000 storage hierarchy. Later, three more open system volumes (having the same sizes as previously created ones) were created in the same volume group so that they could be used as target volumes in FlashCopy relationships.
SDD 1.6.0, which is multi-pathing software from IBM, was also installed on the AIX host with proper host attachment script for DS8000. This software caused the DS8000 LUNs to appear as vpaths devices (rather than hdisk) on the AIX operating system.
Automated Backup Solution Implementation
Our requirement was to develop an online backup solution for a 2-TB Oracle 9.2 database environment running on AIX 5.3 and HACMP 5.2. We achieved this by integrating DS CLI and IBM FlashCopy with some Unix shell scripting for this specific environment. However, the same solution can be used (with some scenario-specific changes) for any database that supports online backups.
DS8000 FlashCopy creation and deletion commands were called from shell scripts and then specific AIX LVM commands were used to make target LUNs available at the operating system level. Because source file systems are mounted when the Flash operation is performed, special measures were taken in this automated solution to ensure that no writing activity happened during the flash operation. This is the only way that we can ensure backup consistency.
On the database level, Oracle "begin backup" and "end backup" SQL commands were used to temporarily suspend write operations (see Listings 1 and 2). On the AIX level, the "freeze" option with the chfs command was used to ensure that all data in file system cache should be written to disk before start of FlashCopy operation.
This new "freeze" option, which is available for only AIX JFS2 file systems removes the need to use the AIX "sync" command, which does nearly the same thing for JFS file systems but does not guarantee it. We calculated the time required to complete the actual FlashCopy operation (in our case, approx. 20 seconds), so we froze our JFS2 file systems (containing data and archive logs) for 45 seconds so that no write activity should be done at the OS level during the FlashCopy operation. As soon as the FlashCopy operation is completed, file systems were thawed, and then Oracle "end back" statements were executed.
We then used powerful AIX LVM commands (including the recreatevg command) to make these target file systems available on the same AIX server containing the source file systems. Hence, source file systems as well as target file systems are mounted on same server (although it was possible to mount target file systems on any AIX node different from source AIX node). These target file systems were then backed up to the TSM server using TSM B/A AIX client with the help of the TSM scheduler.
We created two shell scripts to fit all these pieces together. One of these scripts, named "flashrecreate.sh" (Listing 3), created FlashCopy, while "flashdisable.sh" (Listing 4) was used to delete target FlashCopy drives and clean up all ODM information before repeating same flash creation process once again.
We did not observe the mandatory requirement of running the fsck command against the target flash file systems before mounting them on the AIX server. This is because we already used the freeze option of JFS2 file systems, which ensured that all data in the file system cache had already been written to disk before the FlashCopy operation started. However, for implementation using simple JFS file systems, it is mandatory to run the fsck command against target file systems before mounting them on AIX. In our scenario, our backup window allowed us to execute the fsck command on target file systems, so we adopted it as an additional tool to ensure data consistency on the operating system level. We noticed a time requirement of almost 45 minutes to run a thorough fsck -y command against all target file systems (almost 1 TB) while fsck commands were run sequentially.
We selected the "nocp" option with the mkflash command. In fact, for establishing FlashCopy relationships on DS8000, you may select one of the two possible modes, background copy and no-background copy (nocp). With the nocp parameter, it is possible to identify whether the data of the source volume should be copied to the target volume in the background. If -nocp is not used, a copy of all data from source to target takes place in the background. With -nocp selected, only updates to the source volume will cause writes to the target volume to save the time-zero data there. This option is therefore useful in solutions where an instant copy is required for backup purposes.
In our solution, because target flash file systems must be mounted on same AIX system having source file systems, these are mounted (and hence archived daily to the TSM server using the TSM scheduler) with different mount points as compared to the original mount points. For example in our implementation, these are mounted with mount points prefixed with /flash. As a result, while restoring from the TSM server, it is required to create and mount these target file systems with the same mount points (like /flash/oracle/data1, etc.).
Once restored (say to /flash/oracle/data1) on the DR server, these mount points can easily be changed back to /oracle/data1 by using the chfs command before starting of application or database from DR system. You may also create a post TSM scheduler script, which can use the OS chfs command to change mount points after each TSM-scheduled restoration.
Summary
Although, in this article, I focused on snapshot technology provided by IBM, this same snapshot technology is now being provided by almost every vendor, and it can be utilized in a similar way for effective online backup solutions for huge databases with virtual elimination of the backup window.
The basic strategy, however, should remain the same, which is, to ensure data consistency at the application and database/OS level during actual snapshot operations coupled with periodic restoration activities to guarantee a fool-proof backup and recovery solution.
References
IBM whitepaper "Storage Solutions for Oracle Database: Snapshot Backup and Recovery with IBM Total Storage Enterprise Storage Server"
IBM Red Book "IBM TotalStorage DS8000 Series: Copy Services in Open Environments SG24-6788-00"
The IBM TotalStorageDS8000 Series: "Concepts and Architecture SG24-6471-00"
Khurram Shiraz is senior Systems Administrator at KMEFIC, Kuwait. In his 8 years of IT experience, he has worked mainly with IBM technologies and products especially AIX, HACMP Clustering, Tivoli, and IBM SAN/NAS Storage. He also has worked with the IBM Integrated Technology Services group. His area of expertise includes design and implementation of high-availability and DR solutions based on pSeries, Linux, and Windows infrastructure. He can be reached at: aix_tiger@yahoo.com.
|