Inside eBook Security

Dr. Dobb's Journal November 2001

By Daniel V. Bailey

While on leave from Brown University's Department of Computer Science, Daniel is product manager for embedded systems at NTRU. He can be contacted at dbailey@ntru.com.

Copy protection had its genesis in the early 1980s when attempts were made to prevent users from copying programs sold for PCs. Notably, the installation program for Lotus 1-2-3 2.x contained a feature limiting the number of times users could install the program on a hard drive. From games to productivity, copy protection features began to appear in all manner of software to prevent unauthorized duplication.

Consequently, companies such as Central Point Software began producing programs like Copy II PC, which could defeat copy-protection provisions. A veritable arms race subsequently broke out between the copy protectors and copy enablers, with copy-protection features becoming more cumbersome at every step, while copy-enabling programs gained in power. The strongest copy-protection solutions, for instance, inconvenienced users by requiring hardware dongles or license files. The latter required the user to contact the publisher after purchase.

By the 1990s, users were clearly rejecting copy-protection provisions in favor of software that could be freely copied. Software publishers by and large gave up on copy protection, since it was generally accepted that any scheme could be defeated.

These days, three forces are at work to cause a resurgence in copy protection.

Copyrighted works such as music and movies are now stored and reproduced in digital form.
Given ubiquitous networking, users can easily make and distribute copies of these works. Thus, publishers are keen to implement technical means to protect their copyrighted material.
The use of cryptography in practical systems has matured greatly since the 1980s. Effective cryptographic techniques, combined with computing power able to quickly run public-key algorithms, raise the possibility that strong cryptography may be a good tool for copy protection. These tools can also enable features that go above and beyond traditional copy protection to let publishers enforce broad sets of rules about how consumers may use their copyrighted works. This expanded functionality is broadly termed "digital rights management" (DRM).

Partly in an attempt to protect the interests of publishers, the U.S. Congress in 1998 enacted national copyright laws called the "Digital Millennium Copyright Act" (DMCA). Among other things, this legislation (Public Law 105-304) makes it a crime to circumvent security controls in DRM-secured content.

In a July 15, 2001 presentation at the Def Con 9 security conference (http://www.defcon.org/), Dmitry Sklyarov presented the results of his Ph.D. research analyzing Adobe's DRM security system for protecting PDF files. But before departing for his native Russia, Sklyarov became the first person to be arrested for criminal violation of the DMCA.

Social, economic, and moral analyses of the DMCA and Sklyarov's case are to be found elsewhere (for instance, see http://www.eff.org/). This article describes the techniques Sklyarov used to defeat portions of Adobe's DRM security for PDF files. All technical details herein were gleaned entirely from Sklyarov's presentation. Via their attorneys, Sklyarov and representatives from ElcomSoft (his employer; http://www.elcomsoft.com/), declined to comment. No attempt has been made to verify the correctness or applicability of these techniques. Any errors herein may represent my failure to understand Sklyarov's presentation, rather than shortcomings in his original presentation.

Keys in Cryptography

Long ago, designers of cryptosystems realized that providing real security meant adopting Kerchhoff's Assumption — attackers could be assumed to possess full knowledge of the cryptographic algorithm used. That is, the goal of a cryptographic algorithm is to embody the protection of data in a key. Knowledge of the key is necessary and sufficient to remove the protections. With a symmetric key for algorithms such as DES, RC4, or AES, the key may be shared among trusted parties. For a public-key algorithm such as RSA or NTRU, the (private) key is mathematically related to another (public) key. Anyone may use the public key to encrypt messages for which the private key is required to read. In either case, the attacker's goal is the same — find the key or keying material sufficient to reconstruct the key.

As software publishers discovered in the 1980s, there is no natural means to keep a key secret on a PC. Nor is there a way to protect keying material sufficient to recreate a key. The problem is even more pronounced when the intention of the system is to hide the key from PC users. Because of the PC's open architecture, experienced users can access any part of a PC's RAM, processor registers, or long-term storage (such as hard drives) to obtain any key contained therein. Add-in hardware for PCs exists that protects keys from prying eyes, but it remains expensive, nonstandardized, and still not entirely beyond the reach of determined adversaries. Use of such hardware is thus not an attractive option to mainstream publishers.

In each case, Sklyarov's ability to defeat Adobe's eBook plug-in security results from one or both of these failures on the part of the plug-in — either the cryptography in the plug-in did not embody its security in a key too big for an attacker to guess, or system-level attacks revealed keying material.

Adobe PDF File Encryption

As Figure 1 illustrates, a PDF file consists of a header, body, cross-reference table, and trailer. In turn, the body is a series of one or more objects. An object consists of an object identifier and data. This data can be a basic type such as Boolean, Numeric, or Object Reference; or a complex type such as String or Stream. The Object ID and a Generation ID uniquely identify each object.

Files are encrypted selectively, leaving basic types in the clear and encrypting complex types. This approach lets all viewers navigate documents with or without the encryption key. In general, the human-readable content of a PDF file will be stored in strings and streams, so this is a reasonable scheme.

The encryption and key management features in Adobe's eBook DRM are handled by plug-ins called "security handlers." So that viewers know which security handler to use, an Encryption Dictionary is added to an encrypted PDF file. This section of the file specifies the security handler name and optionally static additional information necessary to reconstruct the encryption key.

Security handler plug-ins can be developed by third parties using Adobe's Acrobat SDK (http://partners.adobe.com:80/asn/developer/acrosdk/docs/readme.html). These plug-ins are designed to be certified and digitally signed by Adobe before the Adobe Acrobat PDF viewer will use them. The standard security handler, available in all current editions of Adobe Acrobat, is developed by Adobe.

The standard security handler aims to allow two levels of access to a PDF file: owner and user. The owner has full access and may control the access an ordinary user has for the document. For instance, users may be prohibited from modifying the document's contents or printing the document. To control access, two passwords are created — one for the user and one for the owner.

The RC4 stream cipher is used to encrypt each string and stream in the document with a unique encryption key. The encryption key used for each object is calculated from the MD5 hash of several values. In versions 1 and 2 of the security handler, the object encryption key is the hash of the document encryption key, the Object ID and the Generation ID. In version 3, the hash of the document encryption key, slightly scrambled Object ID and Generation ID, and finally the ASCII string "sAlT" is used. Salt is a technical term for static data added to an encryption operation to thwart dictionary attacks; most likely, the string "sAlT" is fulfilling that purpose.

Thus, the security in this scheme is embodied in the document encryption key, since all other values are known by all. The attacker's goal, then, is to find the document encryption key. Since it is used to encrypt strings, both the user and the owner need this one key.

Two versions of the document encryption key are encrypted and stored in the PDF file's Encryption Dictionary. Either the user key or owner key can recover the document encryption key. If the user key is either known or blank, the user can decrypt the document encryption key and decrypt the entire document. Once users have obtained the document, they can do with it what they please, including writing the formerly encrypted strings to a new, unprotected PDF file without the owner's password or consent. That is, the difference in access level between the user and the owner is enforced by software checks in the standard security handler and not by cryptography.

If the user's password is not known, the attacker can mount a brute-force attack against RC4. In version 1 of the security handler, RC4 is used with a 40-bit key. Keys of this size may be easily found in a matter of hours on a few well-equipped workstations.

Thus, the owner-level protections in the Adobe Acrobat standard security handler can be bypassed due to inadequate protection of the document encryption key.

Plug-In Crypto Failures

According to Sklyarov, other plug-ins fared even worse than Adobe's. Some, such as the eBook Pro compiler (http://www.ebookpro.com/), used bad encryption algorithms. This package compresses data, then XORs each byte of the compressed version with every character in the ASCII string "encrypted." This is the same as XORing each byte of the compressed data with a constant. In essence, it amounts to XORing with a 7-bit key, which is trivial to break.

Encrypted PDFs produced by the New Paradigm Resource Group (NPRG; http://www.nprg.com/) require a dongle and a password to be read. You would assume that keying material is stored in the dongle. In fact, NPRG uses a single encryption key for all documents. Worse, this key is stored in the plug-in's executable code. In short, the dongle and password aren't used in decryption at all. They are merely verified by the plug-in software before decryption proceeds. To bypass the security, you can use the "Rot13" sample plug-in supplied with Adobe's Acrobat 4 SDK. It suffices to modify the key, remove password checking from the plug-in, and compile. The result is a plug-in that can read any file from NPRG.

FileOpen Publisher 2.3 (http://www.fileopen.com/publisher.html) also encrypts all documents with one fixed key. After being alerted to this type of attack by Elcomsoft (the company Sklyarov works for), FileOpen released Publisher 2.4, which uses variant keys. However, the encrypted document contains all keying material, allowing attackers to easily reconstruct the key and decrypt.

Softlock Services, an apparently defunct plug-in vendor, produced a plug-in that took the reasonable step of attempting to bind the content to the user's computer. Unfortunately, execution was lacking. The password was generated using the computer's hard drive's volume ID. But the password was only eight characters long. Worse, each character was converted to one hexadecimal digit, then two characters were used for integrity checking. The resulting effective password length was only 24 bits! A single 450-MHz computer can find the password in less than a day of computation.

System-Level Attacks

Adobe WebBuy is among the more sophisticated eBook security solutions. Opening a file requires a license file, which contains the encrypted document encryption key, document access permissions, and a certificate to check license validity. Users get two RSA public keys corresponding to private keys owned by Adobe. One key is 1024 bits in length and the other is 912 bits in length. The document encryption key is tied to information (such as CPU ID and User ID) intended to uniquely identify the PC.

The procedure for unlocking a file begins with checking the RSA signature on the digital certificate using the 1024-bit key. The software then performs system checks such as verifying that the computer's CPU ID matches that specified in the license file. The document encryption key is then decrypted using the same 1024-bit RSA key just used to verify the signature on the certificate. The intermediate result thus obtained is then decrypted again, this time using the 912-bit RSA key. The result is combined with data from the encryption dictionary, CPU ID, User ID, and other identifying information using MD5 and XOR to reconstruct the document encryption key.

Thus, the document encryption key may be directly reconstructed from the PDF file, license file, and two RSA keys stored on disk; inadequate protection for keying material again compromises the system.

The case of Adobe's Acrobat eBookReader (formerly known as Glassbook) is another example of reasonable security design that falls apart due to lack of protections for keying material. In this case, the RSA key used to decrypt the document encryption key is itself encrypted with the symmetric cipher RC5. The key used for the RC5 encryption is derived from the CPU ID and hard drive volume ID processed with SHA-1. If a CPU ID is not available, such as on Pentium II processors, an encrypted version of the RC5 key is stored in a hidden file on disk. This RC5 key is, in turn, encrypted with a fixed key coded into the executable code. In either case, says Sklyarov, the keys are easily reconstructed and the document encryption key found.

Another problem facing the Adobe DRM system is the simple observation that the document encryption key is reconstructed using the MD5 hash function. While MD5 is generally regarded as cryptographically strong for most applications, code that implements it is easily found in an executable. To initialize its internal state, MD5 uses four fixed 32-bit constants for which you can easily search. Once found, attackers can intercept calls to MD5 that provide keying material as input.

To thwart this avenue of attack, Adobe Acrobat only loads Adobe certified plug-ins. Plug-ins themselves must be digitally signed, thus reducing the possibility that a malicious plug-in is loaded. However, rather than signing the executable code, the digital signature only covers the Portable Executable header: a collection of fields describing the file to Windows' internal loader utility, including such things as size of the code, pointer to the symbol table, and time/date stamp.

Careful modifications to the executable code can be made without disturbing the header. For instance, you can modify the code of a certified plug-in to load a non-certified plug-in and pass control to it. Thus, Acrobat can be made to trust any plug-in.

Conclusion

There are two technical lessons which can be learned from Dmitry Sklyarov's work. First, bad cryptography will be broken. Second, PCs inherently offer no way to protect secrets. Even with the use of good cryptographic algorithms and reasonable system design, this is a fundamental problem. In short, in their current incarnation, PCs aren't well suited for digital rights management systems.

DDJ