Backup Encryption
Kerry Thompson
Hardly a week goes by without some story in the news about a company leaking important data through loss of their backup tapes. Whether it is through malicious theft, opportunistic snatching, or accidental misplacement, there is a huge cost to a business when data is lost. When the data contains sensitive information about members of the public, possibly including bank account and credit card numbers, the cost can be severe indeed. Simply the stigma of having to notify clients that you've lost their data is extremely damaging.
For example, in June 2005, Citigroup was forced to issue a media statement admitting that tapes containing personal information on 3.9 million customers had been lost while the tapes were being carried to another site.
In July 2006, Chase staff accidentally discarded five computer tapes containing personal information and financial records of 2.6 million Circuit City credit card account holders. Even though the tapes were in a locked box, the company was forced to release a public statement and notify and compensate all of those customers.
And this threat isn't new at all. In 1977, a disgruntled systems administrator at Imperial Chemical Industries (ICI) in the UK stole a set of backup tapes and attempted to extort a large amount of money from the company for their return. He was eventually captured.
Despite the obvious and well-known risks, I've found that few companies actually use encryption to protect their backups. This is quite surprising to me, considering how easy it is to do and just how costly it is to lose data. In this article, I'll show you how to add encryption to your backups, and I'll describe some of the pitfalls to watch out for.
Backup Scripts with Encryption
There are a couple of good open source tools that will do encryption nicely. Probably the best one is OpenSSL, which supports a wide range of ciphers and is very easy to add into existing backup scripts. Quite simply, the command:
openssl enc -aes-256-cbc -salt -pass pass:123456
will encrypt standard input with the key Ò123456Ó and write to standard output. Note that the output stream will be 8-bit binary unless you specify the -a option for base64 output encoding. The -salt option should always be used to place a salt in the encrypting key derivation functions. The key, or password, really should not be specified on the command line like in the example above, because any user will be able to see it if they run the ps command. It is better for automated scripts to put the key into a file, which is set to be read-only by root and use the file:filename construct as in the following backup example:
tar cvf - / | \
openssl enc -aes-256-cbc -salt -pass file:/root/.backup_key >/dev/rmt0
To decrypt with OpenSSL, use the -d option:
openssl enc -d -aes-256-cbc -pass file:/root/.backup_key </dev/rmt0 | \
tar xvf -
GPG is also a good tool for encryption, and a number of encrypting backup systems prefer to use GPG. GPG is well-known as a public-key, or asymmetric, encryption tool. This method allows the data to be encrypted with one key and requires a separate key for the decryption. Unless you have very specific requirements for your backups, you probably won't need this functionality; for most backups, symmetric encryption is best.
To encrypt with GPG in symmetric mode, you can use commands similar to the following:
GPG_KEY=/root/.backup_key
tar cvf - / | \
gpg --batch --disable-mdc --symmetric --cipher-algo AES256 \
--passphrase-fd 3 3<$GPG_KEY >/dev/rmt0
Decrypting is similar, but with the --decrypt option:
GPG_KEY=/root/.backup_key
gpg --batch --quiet --no-mdc-warning --decrypt --passphrase-fd 3 \
3<$GPG_KEY </dev/rmt0 | tar xvf -
Note how we obfuscate the location of the key file from the running command line, which users would be able to see with the ps command, by reading it in through file descriptor 3.
Encryption with Amanda
Amanda is one of the best open-source, backup management packages available. It is very flexible, scalable, and very easy to customize and extend. The latest version (I used Amanda 2.5.1) has extensions and plug-ins to perform encryption with either the client or the server doing the hard work of encrypting the data.
Assuming you've already got Amanda installed and setup to run your backups, adding encryption for backups (... and decryption for restores) is very easy to do. On the Amanda server, configure a dumptype in /etc/amanda/DailySet1/amanda.conf (DailySet1 is the name for the backup set) to include an encryption plug-in:
define dumptype client-encrypt-ossl {
global
program "GNUTAR"
comment "no compression and client symmetric encryption with OpenSSL"
compress none
encrypt client
client_encrypt "/usr/sbin/amcrypt-ossl"
client_decrypt_option "-d"
}
The options are quite self-explanatory. This specifies that the backup set will use encryption and that encryption will be performed on the client system using /usr/sbin/amcrypt-ossl, which is a simple shell script calling upon openssl to perform the encryption. Encryption is also performed on the client using the same script but with the -d option added.
Also on the Amanda server, change the dumptype for the directory on the client to be backed up, in /etc/amanda/DailySet1/disklist:
gorgon.crypt.gen.nz /home client-encrypt-ossl
This will tell Amanda to use the client-encrypt-ossl dump type when backing up /home on the backup client system Ògorgon.crypt.gen.nzÓ.
On the client to be backed up, check that /usr/sbin/amcrypt-ossl is installed (it comes with Amanda v2.5 releases), and then configure an encryption key, or passphrase, for the backup:
echo "amanda_encryption_key_86299993456" >/var/lib/amanda/.am_passphrase
chown amandabackup:disk /var/lib/amanda/.am_passphrase
chmod 700 /var/lib/amanda/.am_passphrase
Naturally, you should make up your own key to put into the .am_passphrase file, and remember to store a copy in a safe place. Now you're ready to run the backup. Check that everything is fine on the server:
$ amcheck DailySet1
Amanda Tape Server Host Check
-----------------------------
Holding disk /var/dumps/amanda: 2090172 KB disk space available, \
using 1987772 KB
slot 3: read label `DailySet1-03', date `X'
NOTE: skipping tape-writable test
Tape DailySet1-03 label ok
Server check took 0.316 seconds
Amanda Backup Client Hosts Check
--------------------------------
Client check: 1 host checked in 0.449 seconds, 0 problems found
(brought to you by Amanda 2.5.1p2)
$
and then run:
amdump DailySet1
to perform the backup with encryption. The results will be emailed to the administrator specified in /etc/amanda/DailySet1/amanda.conf.
Whenever you change your backups, always test that it is working. When setting up encryption, you need to test that the data on tape is actually encrypted and make sure that you can read it back as well!
You can change the cipher used for encryption by editing the encryption script /usr/sbin/amcrypt-ossl on the Amanda client system; the standard release uses aes-256-cbc, which will be fine for most purposes.
Which Algorithm?
It is important to choose the encryption algorithm, or cipher, carefully. Some commercial backup systems may not have the option to choose the cipher, but with OpenSSL there is a big choice of algorithms to choose from.
The version of OpenSSL I used for testing (v0.9.8a) supports all of these:
aes-128-cbc aes-128-ecb aes-192-cbc aes-192-ecb aes-256-cbc
aes-256-ecb base64 bf bf-cbc bf-cfb
bf-ecb bf-ofb cast cast-cbc cast5-cbc
cast5-cfb cast5-ecb cast5-ofb des des-cbc
des-cfb des-ecb des-ede des-ede-cbc des-ede-cfb
des-ede-ofb des-ede3 des-ede3-cbc des-ede3-cfb des-ede3-ofb
des-ofb des3 desx rc2 rc2-40-cbc
rc2-64-cbc rc2-cbc rc2-cfb rc2-ecb rc2-ofb
rc4 rc4-40
Of course, some of these (like base64) really aren't for encryption.
GPG supports a smaller range of ciphers, version 1.4.2 supports the following:
3des, cast5, blowfish, aes, aes192, aes256, twofish
I recommend AES-256-CBC as a good option. AES is a fast algorithm; the 256-bit key size is long enough to be secure against attacks for a good few years, and it is small enough to be fast. CBC designates the algorithm used to apply the AES to a data stream (remember, AES is a block cipher and only processes discrete blocks of information). Don't use the ECB block mode -- this just isn't very strong in situations such as backups where the blocks of data at the start of the unencrypted stream are fairly predictable.
You should avoid plain DES, because it has a relatively short 56-bit key length; triple-DES is suitable for using in backups, but it is quite slow as will be shown in the next section.
Performance
Of course, you don't get anything for free. Encrypting large volumes of data can have quite a high impact on CPU usage depending on which algorithm you use, and performance impacts can be one of the major issues preventing companies from implementing encryption.
I performed some basic tests with OpenSSL using a number of ciphers to encrypt 8 GB of data on a 1.0GHz Intel server with the output just going to /dev/null. The encryption time is given in Figure 1. From these results, you can see that triple-DES is really quite slow, and that AES-128 and AES-256 perform very well in comparison.
It can be hard to predict just how much affect encryption will have on a system. If the output device, such as a tape drive, is slow then the encryption process will get slowed down and you may not notice the extra load. If you are writing to a high-speed device, such as a dump area on a disk, then the CPU load from the encryption process could be very high. It is always best to perform some load tests beforehand.
If server performance is an issue, you can look at interface devices, which can perform encryption. A number of PCI-bus SCSI controllers can perform this, and there are also some hardware devices, such as the CryptoAccelerator PCI Card, which can offload the encryption work into a hardware device.
There are also some devices that can connect between a host and a storage device that performs encryption, such as NeoScale's CryptoStor Tape devices. Of course, there are also a number of tape drives that can perform encryption from vendors, such as IBM and Sun. Be careful when using hardware appliances, however, and always plan for what steps you will need to take if that encrypting device stops working. Make sure you can extract and reuse the encryption keys.
Also be aware that encrypting data will affect how well it can be compressed -- whether you are using software or hardware compression. The ideal encryption algorithm will generate an output stream that is perfectly random and cannot be compressed. So, if you want compression with encryption, it would be best to perform compression first and then apply encryption to the compressed data.
Another factor to be aware of is how well the encryption algorithm can cope with errors in the backup media. When using AES with CBC (Cipher Block Chaining) mode, a single bit-error in the encrypted stream will cause the entire 256-bit block containing the error and the following block to be indecipherable. After those two blocks, the decoder should be able to recover and continue decrypting the data. However, those two bad blocks could severely affect how well the system can recover files from the decrypted stream.
Key Management
Bear in mind that in terms of encryption the key is everything. If you start writing encrypted backups for a month or so, then lose the key you were using, all of that backup data will be useless. Don't even think about trying to decode it without the key. So, in this respect, you must keep your key data just as secure as your backup media. Print out the key data on paper and store it in the company safe -- and maybe in an offsite secure location as well. If you have a very long key file, then save it to a number of different types of media (CD-ROM, USB drive, floppy disk) and store them all somewhere safe.
Keys should be changed on a regular basis, and when employees and contractors leave the company. Be aware that losing a key will cause the loss of the backups made with that key. Always keep old keys in safe storage.
Always have a plan for what you'll do if the worst happens, such as if the data center burns down. Will you be able to quickly recover your backups in that case?
Conclusion
Encrypting backups is very simple and should be mandatory if your organization is sending tapes offsite. If you are using commercial backup tools, like Netbackup, there should also be encryption options that you should investigate and use.
In this article, I've shown how to configure encrypted backups using OpenSSL and GPG encryption tools. If your backups are driven by plain shell scripts, you should have no problems incorporating encryption. I've also shown how to configure Amanda to use encryption, which is very straightforward in the latest versions of Amanda.
Note that some other types of backups, for example those that create a bootable recovery system, like mksysb on AIX and Ignite on HP/UX, may not be able to encrypt their data -- so take special care when using these to make backups.
On Linux, mkcdrec is a very useful backup system that can create a bootable recovery CD and encrypt the filesystem data that is written onto the CD image. It simply prompts the user for the key when it performs recovery functions.
References
OpenSSL -- http://www.openssl.org
GPG -- http://www.gnupg.org
Quick and Secure Network Backup Setup using Amanda -- http://amanda.zmanda.com/quick-backup-setup.html
Kerry Thompson has been working with Unix since the days of Unix Version 7 on the PDP-11 more than 20 years ago. These days he works as an open systems technology consultant in sunny Auckland, New Zealand. He can be contacted at: kerry@crypt.gen.nz.
|