Restoring
Your Confidence in Oracle Backups
John Ouellette
Wouldn't it be great to be able to rely on your backups and have
full confidence that they will work properly during a disaster or
a routine restore operation? Unfortunately, making assumptions about
your restores can lead to delays in projects, wasted time, and ultimately
data loss. While it may be fairly easy to configure backups and
assume the restore process will work as long as the backups have
run successfully, the reality is that each process is different
and involves its own set of challenges.
On a basic hardware level, backups write to tape, and restores
perform a read operation on the tape hardware. As the following
example illustrates, success of one process does not guarantee success
of the other.
The Power of Testing
When testing our database restores, I set the read-only tab on
the tapes to avoid any surprises or accidental tape overwrites.
The backup logs looked fine with no errors. Unfortunately, the particular
tape drive selected for the restore had a bug in its firmware and
rendered the drive unusable with the tab set. If we hadn't tested,
we would have incurred much lost time and much aggravation chasing
down a solution when a real problem occurred on a production database.
Testing not only can increase your confidence in the success of
restores, but also allow you to estimate more accurately when the
restore will finish. It may seem reasonable to estimate a database
that is backed up in an hour can be restored in an hour, but that
is not always the case.
For example, it is common for database administrators to back
up multiple databases to one tape drive at a time, whether it's
to a locally attached tape or a SAN. This a good practice that will
increase the chance that the tape drive is constantly writing and
not spending time rewinding and repositioning the tape head while
waiting for data, also known as shoe shining.
However, this also means the tape will write backup pieces of
database X, Y, and Z in a non-linear fashion, so the databases pieces
could end up looking something like this on tape: XYZXYXZZXYZ. During
a restore of database X, the tape head would need to do a read and
skip forward many times to read all backup pieces for that database.
This can easily double or possibly triple the time needed to perform
a restore operation of a database. A solution to this situation
is to clone your database save sets with your backup software; however,
that is outside the scope of this article. Testing your restores
often empowers you to make intelligent decisions about when to tell
management the database will be back up.
Restoring Oracle databases also means downtime. Too often users
or project planners overlook the critical piece of restore testing
or an administrator inherits a situation where there are no recovery
procedures. It is often difficult to get downtime on production
databases, and, if you get it, it often comes at odd hours to reduce
the impact to users and business processes. Oracle provides some
great tools and options to make this difficult task manageable.
Duplication
As an alternative to full-blown testing on your live production
database, Oracle supports the duplicate command, a feature as of
Oracle 8.1.5. In addition to many other options, the RMAN duplicate
command allows you to restore a database to another physical host.
For example, say you have a production database on host1, named
MYDATA; you can restore this database to host2 and confirm your
restore process is successful. Furthermore, you now have the option
to use that duplicated database as a restore playground. This allows
you to back up and restore a copy of your production database, make
mistakes on it, and try new command procedures, all without affecting
your current production system or data.
To do this, you will need to follow a few steps. First, you need
to confirm that you have enough disk space on the server to which
you will be saving the restore. Second, you'll need to create the
shell of the database that you'd like to restore, and then set it
to a NOMOUNT state; this will be your auxiliary database. Listing
1 shows the commands needed to create a generic 9i shell, in a script
format, assuming the database name "mydata".
Next, as the oracle user on the auxiliary physical host, you'll
need to fire up RMAN with a triple connect sequence. For example:
rman catalog rman/XXXX@rman target sys/XXXX@mydata auxiliary /
RMAN doesn't actually perform any commands on your target production
database; it just needs to make sure the DBIDs do not conflict with
one another. RMAN backups usually take the form of using the backup
command, typically inside a run block, where restores will typically
use the restore and recover commands. In this case, we use the duplicate
command as shown in Listing 2.
In regard to the script, first note that I am allocating two channels,
one for the auxiliary database, and one for the current database.
Second, there are four commands in the run block; the allocate channel
commands may run on separate lines due to formatting. I am also
passing variables that my backup software requires to point to a
proper backup server and client. This does not interfere with RMAN.
The "Until Sequence" line tells RMAN the point to which it is
to restore and recover. Assuming you want the duplicated database
to look exactly like the last good backup of the source, you can
determine this value directly from your RMAN catalog. The easiest
way to do this is to connect to the RMAN catalog and issue a "LIST
BACKUP OF ARCHIVELOG ALL" command and look for the entry with the
highest Sequence value, which RMAN labels as "Seq". As you can see,
I am using sequence 10059 thread 1 (see Listing 3).
The "nofilenamecheck" option to the duplicate command is the simplest
way of duplicating your database. Basically, this tells RMAN that
your directory structure on your auxiliary database is exactly the
same as your target database. If you want to have a different file
system structure, you can instruct RMAN to rename the directories
as it restores with the set newname command, executed before
the duplicate command inside the RMAN run block. For example:
SET NEWNAME FOR DATAFILE 5 TO '/export/home/oracle/myusers.dbf'
duplicate target database to mydata;
Wildcards are not permitted, however set newname may come in
handy in certain instances. Also of note during duplication is the
SKIP TABLESPACE option. This will simply skip certain tablespaces
during duplication. For example:
duplicate target database to mydata skip tablespace myusers;
will instruct RMAN not to restore the tablespace called "myusers".
If all has gone well, you now have an exact copy of mydata on
your test server. One of the nice features of the duplicate command
is that RMAN will create a new DBID for you on the auxiliary database,
different from the source. This means you can register the duplicated
database in your existing RMAN catalog. Again, this is helpful because
it will not affect your production database.
As a visual proof, go ahead and run the RMAN triple connect sequence
after the duplication. Both databases will report a different DBID
as shown in Figure 1.
The advantages to duplication are many. First, you can test out
your restores of production databases without downtime. This alone
is worth investigation for most companies. Second, the level of
confidence in restore procedures increases greatly because it enables
you to detect problems early.
All these advantages balance out with some of the disadvantages,
of course. For example, a good practice would be to use another
set of hardware to test with, preferably just like production, and
this may not be in the budget. You'll also need another Oracle,
operating system, and potentially another backup software license.
Although RMAN supports duplication to the same host, this opens
up the real potential of errors and damage to your production database.
If you decide to do this, make sure you have a firm grasp of all
the concepts in this article and the resources section, or you may
end up causing more harm than good.
Validation
If you're not immediately ready to tackle the duplication steps,
you're not totally out of luck. Oracle 9i supports the use of the
validate option to the restore command. RESTORE VALIDATE
will allow you to run through a restore without actually restoring
the files. In fact, you can test the restore of the entire database
or individual table spaces, data files, or control files. No downtime
is needed because the data files do not need to be taken offline
when validated.
For example, you can use this option as if you were restoring
normally, just add the keyword validate after your restore command:
run {
allocate channel t1 type 'sbt_tape';
set until time "TO_DATE('06/25/05 11:54','MM/DD/YY HH24:MI')";
restore database validate;
release channel t1;
}
This is a good way to validate your restores; however, the only way
to truly trust that you will be able to restore your databases effectively
is to proceed through an actual restore and document the procedures.
Finally, in addition to the RESTORE VALIDATE command you
may also run the BACKUP VALIDATE command as an added tool
to your recovery toolset. This command, according to Oracle online
docs, will check data files for physical and logical corruption,
and confirm that all database files exist and are in the correct
locations.
In this case, RMAN will not actually do a backup but will scan
the specified files to determine whether they can be backed up.
So the BACKUP VALIDATE command is similar to the RESTORE
VALIDATE command, except you use it specifically for the backup
command. For more extensive documentation on RMAN and these commands,
please refer to the resources section.
I have demonstrated some options to increase your confidence in
your current Oracle backup strategy. Having a well-tested plan will
save you time and your data in the long run.
Resources
Metalink: Duplicate Database in Oracle9i --
http://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=228257.1
Metalink: Creating a Duplicate Database --
http://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=73912.1
Oracle: Duplicating a Database with Recovery Manager --
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96566/rcmdupdb.htm
Oracle: RMAN Concepts I: Channels, Backups, and Copies --
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96566/rcmconc1.htm#468311
Redo Log File Management and Issues when using RMAN to do Automatic
Recovery --
https://metalink.oracle.com/metalink/plsql/ \
ml2_documents.showDocument?p_database_id=NOT&p_id=155656.1
John Ouellette is a Senior Unix Adminstrator who believes the
command line is king. John can be contacted at: john@ouellations.com. |