Article Figure 1 Listing 1 Listing 2 Listing 3 may2006.tar

Restoring Your Confidence in Oracle Backups

John Ouellette

Wouldn't it be great to be able to rely on your backups and have full confidence that they will work properly during a disaster or a routine restore operation? Unfortunately, making assumptions about your restores can lead to delays in projects, wasted time, and ultimately data loss. While it may be fairly easy to configure backups and assume the restore process will work as long as the backups have run successfully, the reality is that each process is different and involves its own set of challenges.

On a basic hardware level, backups write to tape, and restores perform a read operation on the tape hardware. As the following example illustrates, success of one process does not guarantee success of the other.

The Power of Testing

When testing our database restores, I set the read-only tab on the tapes to avoid any surprises or accidental tape overwrites. The backup logs looked fine with no errors. Unfortunately, the particular tape drive selected for the restore had a bug in its firmware and rendered the drive unusable with the tab set. If we hadn't tested, we would have incurred much lost time and much aggravation chasing down a solution when a real problem occurred on a production database.

Testing not only can increase your confidence in the success of restores, but also allow you to estimate more accurately when the restore will finish. It may seem reasonable to estimate a database that is backed up in an hour can be restored in an hour, but that is not always the case.

For example, it is common for database administrators to back up multiple databases to one tape drive at a time, whether it's to a locally attached tape or a SAN. This a good practice that will increase the chance that the tape drive is constantly writing and not spending time rewinding and repositioning the tape head while waiting for data, also known as shoe shining.

However, this also means the tape will write backup pieces of database X, Y, and Z in a non-linear fashion, so the databases pieces could end up looking something like this on tape: XYZXYXZZXYZ. During a restore of database X, the tape head would need to do a read and skip forward many times to read all backup pieces for that database. This can easily double or possibly triple the time needed to perform a restore operation of a database. A solution to this situation is to clone your database save sets with your backup software; however, that is outside the scope of this article. Testing your restores often empowers you to make intelligent decisions about when to tell management the database will be back up.

Restoring Oracle databases also means downtime. Too often users or project planners overlook the critical piece of restore testing or an administrator inherits a situation where there are no recovery procedures. It is often difficult to get downtime on production databases, and, if you get it, it often comes at odd hours to reduce the impact to users and business processes. Oracle provides some great tools and options to make this difficult task manageable.

Duplication

As an alternative to full-blown testing on your live production database, Oracle supports the duplicate command, a feature as of Oracle 8.1.5. In addition to many other options, the RMAN duplicate command allows you to restore a database to another physical host. For example, say you have a production database on host1, named MYDATA; you can restore this database to host2 and confirm your restore process is successful. Furthermore, you now have the option to use that duplicated database as a restore playground. This allows you to back up and restore a copy of your production database, make mistakes on it, and try new command procedures, all without affecting your current production system or data.

To do this, you will need to follow a few steps. First, you need to confirm that you have enough disk space on the server to which you will be saving the restore. Second, you'll need to create the shell of the database that you'd like to restore, and then set it to a NOMOUNT state; this will be your auxiliary database. Listing 1 shows the commands needed to create a generic 9i shell, in a script format, assuming the database name "mydata".

Next, as the oracle user on the auxiliary physical host, you'll need to fire up RMAN with a triple connect sequence. For example:

rman catalog rman/XXXX@rman target sys/XXXX@mydata auxiliary /
RMAN doesn't actually perform any commands on your target production database; it just needs to make sure the DBIDs do not conflict with one another. RMAN backups usually take the form of using the backup command, typically inside a run block, where restores will typically use the restore and recover commands. In this case, we use the duplicate command as shown in Listing 2.

In regard to the script, first note that I am allocating two channels, one for the auxiliary database, and one for the current database. Second, there are four commands in the run block; the allocate channel commands may run on separate lines due to formatting. I am also passing variables that my backup software requires to point to a proper backup server and client. This does not interfere with RMAN.

The "Until Sequence" line tells RMAN the point to which it is to restore and recover. Assuming you want the duplicated database to look exactly like the last good backup of the source, you can determine this value directly from your RMAN catalog. The easiest way to do this is to connect to the RMAN catalog and issue a "LIST BACKUP OF ARCHIVELOG ALL" command and look for the entry with the highest Sequence value, which RMAN labels as "Seq". As you can see, I am using sequence 10059 thread 1 (see Listing 3).

The "nofilenamecheck" option to the duplicate command is the simplest way of duplicating your database. Basically, this tells RMAN that your directory structure on your auxiliary database is exactly the same as your target database. If you want to have a different file system structure, you can instruct RMAN to rename the directories as it restores with the set newname command, executed before the duplicate command inside the RMAN run block. For example:

SET NEWNAME FOR DATAFILE 5 TO '/export/home/oracle/myusers.dbf'
duplicate target database to mydata;
Wildcards are not permitted, however set newname may come in handy in certain instances. Also of note during duplication is the SKIP TABLESPACE option. This will simply skip certain tablespaces during duplication. For example:

duplicate target database to mydata skip tablespace myusers;
will instruct RMAN not to restore the tablespace called "myusers".

If all has gone well, you now have an exact copy of mydata on your test server. One of the nice features of the duplicate command is that RMAN will create a new DBID for you on the auxiliary database, different from the source. This means you can register the duplicated database in your existing RMAN catalog. Again, this is helpful because it will not affect your production database.

As a visual proof, go ahead and run the RMAN triple connect sequence after the duplication. Both databases will report a different DBID as shown in Figure 1.

The advantages to duplication are many. First, you can test out your restores of production databases without downtime. This alone is worth investigation for most companies. Second, the level of confidence in restore procedures increases greatly because it enables you to detect problems early.

All these advantages balance out with some of the disadvantages, of course. For example, a good practice would be to use another set of hardware to test with, preferably just like production, and this may not be in the budget. You'll also need another Oracle, operating system, and potentially another backup software license. Although RMAN supports duplication to the same host, this opens up the real potential of errors and damage to your production database. If you decide to do this, make sure you have a firm grasp of all the concepts in this article and the resources section, or you may end up causing more harm than good.

Validation

If you're not immediately ready to tackle the duplication steps, you're not totally out of luck. Oracle 9i supports the use of the validate option to the restore command. RESTORE VALIDATE will allow you to run through a restore without actually restoring the files. In fact, you can test the restore of the entire database or individual table spaces, data files, or control files. No downtime is needed because the data files do not need to be taken offline when validated.

For example, you can use this option as if you were restoring normally, just add the keyword validate after your restore command:

run {
allocate channel t1 type 'sbt_tape';
set until time "TO_DATE('06/25/05 11:54','MM/DD/YY HH24:MI')";
restore database validate;
release channel t1;
}
This is a good way to validate your restores; however, the only way to truly trust that you will be able to restore your databases effectively is to proceed through an actual restore and document the procedures.

Finally, in addition to the RESTORE VALIDATE command you may also run the BACKUP VALIDATE command as an added tool to your recovery toolset. This command, according to Oracle online docs, will check data files for physical and logical corruption, and confirm that all database files exist and are in the correct locations.

In this case, RMAN will not actually do a backup but will scan the specified files to determine whether they can be backed up. So the BACKUP VALIDATE command is similar to the RESTORE VALIDATE command, except you use it specifically for the backup command. For more extensive documentation on RMAN and these commands, please refer to the resources section.

I have demonstrated some options to increase your confidence in your current Oracle backup strategy. Having a well-tested plan will save you time and your data in the long run.

Resources

Metalink: Duplicate Database in Oracle9i --
http://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=228257.1

Metalink: Creating a Duplicate Database --
http://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=73912.1

Oracle: Duplicating a Database with Recovery Manager --
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96566/rcmdupdb.htm

Oracle: RMAN Concepts I: Channels, Backups, and Copies --
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96566/rcmconc1.htm#468311

Redo Log File Management and Issues when using RMAN to do Automatic Recovery --
https://metalink.oracle.com/metalink/plsql/ \
ml2_documents.showDocument?p_database_id=NOT&p_id=155656.1

John Ouellette is a Senior Unix Adminstrator who believes the command line is king. John can be contacted at: john@ouellations.com.