There's more to password protection than picking a powerful encryption algorithm. Even a modest algorithm is effective as part of a well designed package.
Tales from the Crypt
Developers nowadays who want to protect their software from unlawful use often must resort to encryption techniques. Encryption is especially important for developers who want to build online registration into their software. Online registration enables a developer to unlock, or activate, a software module over the phone. This article shows how to create an encryption system for online software registration. Along the way, it examines a few broad security issues, as well as how users might attempt to pervert such a security system. Not only does this system offer high levels of protection, but it allows you to avoid the cost (in development, pricing, or purchasing) of full-blown public key encryption routines.
There are a wide variety of alternative cryptographic algorithms, each offering a unique blend of limitations and benefits. The numerous available approaches differ not only in mathematical characteristics but also in complexity of implementation. The relatively easy to implement encryption routine presented in this article offers more than sufficient protection for many applications and compares well with its more complex cousins such as PGP (Pretty Good Privacy) [1] . An ideal scheme would be one that a programmer could implement without the need of reference material, and which can be expressed with a modest amount of code. The completed system presented in this article offers such implementation simplicity, even while meeting demanding security requirements.
Online registration requires additional transmutative techniques, related to data aggregation and to encryption. This article presents C implementations for these techniques as well: radix conversions and salt values and incorporates them step-by-step into the development of the registration system.
The Online Registration Process
The following is a description of the online registration process. Users purchase a developer's software package in which the main modules are already enabled. There are some additional modules, however, which the user cannot access until he or she pays an additional fee. The developer distributes the package in this manner to avoid creating multiple versions. Instead, the developer distributes the same full-featured package to everyone and activates specific modules over the phone (after credit-card details have been supplied). This method offers a couple of advantages to users as well: they can use these registered modules within minutes of paying for them, and they need not bother with installing a new customized version of the software. Additionally, the developer can activate modules for limited time periods, enabling, for example, free one-month trials of unregistered modules, or annual charges for use of certain modules.
To activate a module on the user's system, the developer sends the user a string of alphanumeric characters over the phone. The string is called an activation code. The user types it into his or her software to activate a module for the prescribed length of time. The software must protect itself against the following attempts to thwart the registration system:
1. A cartel of users pools their funds to purchase an activation code, pretending to be one person. As protection, an activation code must be valid for one user only.
2. A user pays for one module and uses the key to activate another (more expensive) module. An activation code must be valid for one module only.
3. A user periodically re-enters an activation code intended for a one-month free trial to gain indefinite access. An activation code must be usable only once.
4. A user discovers a pattern in the supplied activation codes to derive the code for another module, or another activation period. Activation codes should have no observable pattern. Activation codes for the same user but different modules should bear no resemblance to each other.
5. The user discovers the encryption algorithm used and attempts to formulate the activation codes. The activation code must not be derivable from the encryption algorithm without also knowing the encryption key.
To prevent unauthorized use, the user's software displays a dialog box containing a message similar to the following whenever the user attempts to use an as-yet unregistered module:
The module you are attempting to use requires registration. Please contact your software vendor. Serialization code: JY9234-MKPA-BHYA Activation code: ______-____-____The dialog box lets the user return to the application without registering, or enter the appropriate activation code to continue.
To obtain an activation code, the user reports the serialization code displayed in the dialog box to the developer (or his customer service representatives). The developer enters this code into his authorization software to generate an activation code to be given to the user, permitting access to the unregistered module. Once registered, the user has access to the module without hindrance and does not see the dialog box again (at least until the activation period expires).
The Information to be Encoded
The serialization code, which the user reports to the developer, tells the developer who the user is and which module the user attempted to access (unbeknownst to the user). The serialization code encrypts the user name (which the user entered as part of the installation process) and module id in a printable but disguised form. The developer's system thus identifies the user by name and can display it on the developer's screen.
The activation code, which the developer sends to the user, is similar to the serialization code in that it encrypts the user's name and module id. The activation code also encrypts the activation period. The user's software decrypts the activation code, compares the user name and module id with the ones used to construct the serialization code and, if they match, activates the module for the nominated period. The serialization and activation codes are permanently recorded on the user's local storage. As long as the constituent decrypted data of the two codes match, that module remains activated until its activation period (if any) expires. Local storage of the codes ensures that users can't "share" activation codes by changing their user name to match someone else's as changing the user name back invalidates the authorization.
The activation code encrypts the activation period as a number of months. Upon receipt of a valid activation code, the user's software uses the current day of the month to calculate the start and end dates of the activation period. (Enforcing a lower bound on the activation period is necessary to foil users who temporarily set their local time far into the future before entering limited period activation codes.) The developer can use an activation period of 0 months to indicate that access to the module does not expire. When the serialization code is intitially formed by the user's software any arbitrarily chosen value may be substituted for the activation period. This valve can be decrypted and extracted by the developer and checked to ensure that it is the constant value, adding a further degree of protection.
The serialization and activation codes both contain the same data, encrypted with different encryption keys, so although they have the same general form it is not possible (in "reasonable" time) to derive one from the other. The strategy developed so far meets the requirements of ensuring that an activation code can be used only for a unique user and module combination, but the system is not watertight yet. What is needed is a little salt.
Adding Salt
Even though the source code for the most common public-key encryption routines is widely available, these routines still provide adequate security in most situations. These encryption schemes are designed so that just knowing the encryption algorithm is not sufficient information for decrypting the code. In most such cases would-be code breakers possess no technique other than to try each possible key. This approach to breaking cryptographic data is known as key search or dictionary attack. Public-key encryption routines defeat dictionary attack by being complex, largely so that the routines will be computationally expensive.
For encryption routines that are not so complex, the encryption scheme must discourage dictionary attack by some other means. Salting is one such technique. The crypt function used in UNIX to encode passwords further impedes dictionary attackers by "seasoning" the encrypted passwords with a little salt [2] , a 12-bit number between 0 and 4,095. The cycling salt value changes with each encryption a password could have any one of 4,096 different representations recorded in the password file. Salting thwarts attackers who would encrypt an entire dictionary and then perform rapid key searches on encrypted passwords.
The online registration system described here encodes a similar cycling value into both codes. Seasoning the serialization codes in this manner causes them to vary each time they are presented to the user, even for the same module. The activation code, similarly seasoned, must match not only the user name and module id of the serialization code but also its particular variant. If the user attempts to guess the activation code, after a limited number of attempts the dialog box can report a message and exit. The next time the dialog box is displayed, the serialization code will have changed and a different activation code will be required to activate the module.
Similarly, if two users legitimately share the same name, disclosed activation codes become useless. The same applies to a single user when an activated module expires (which is conceptually equivalent to two users with the same name except that the pair are separated by time and not space). Unless the salt values remain the same, the old activation code is useless and nothing is gained by entering it a second time. The salt value is initially selected randomly and cycles incrementally from that value.
The Encoding Scheme
The registration systems meets the previously stated security requirements by composing the codes from four discrete pieces of data: the user name, the module id, an activation period, and the salt value. The registration software constructs these codes in a three-stage approach. In the first stage the data are aggregated into a single numeric value. Aggregation not only reduces the number of bits required to represent the data (shortening the length of the codes) but also prepares the data for keyed encryption, which is the second stage. The third stage is to convert this encrypted value to a text representation.
Aggregation is performed via "radix conversion." To understand radix conversions, it is helpful to remember that numeric data appearing as a bit pattern has no intrinsic meaning. As with any data, the interpretation is inherent in the data processing algorithms that operate on it and not in the data itself. This absence of inherent meaning also implies that a numeric datum has no natural base (a.k.a. radix).
It is tempting to think of numeric data as inherently possessing a value in some particular base. This is not so. Operations can be performed on numeric data in any arbitrarily chosen base, and may even vary from digit to digit. Consider the integer value 33,300, representing the time of day expressed as the number of seconds since midnight. The following snippet converts the value to text using a set of varying bases assigned to its digits - from least to most significant: 10, 6, 10, 6, 10.
const char *time2str( int time ) { static char str[ 7 ]; char *p = str + 5; *p-- = time % 10 + '0', time /= 10; *p-- = time % 6 + '0', time /= 6; *p-- = time % 10 + '0', time /= 10; *p-- = time % 6 + '0', time /= 6; *p-- = time % 10 + '0', time /= 10; *p = time + '0'; return str; }This snippet converts the time, quarter past nine in the morning, to the string 091500. In this system of representation, the 86,400 seconds of the day range from 000000 to 235959. The registration system uses this technique of combining numeric values with different bases to combine the components of the serialization or activation codes into a single numeric value. Furthermore, once this value has been encrypted, the system uses the same technique to convert the encrypted value to a text representation composed of alphanumeric characters. A fortuitous side-effect of this aggregation scheme is that it reduces the number of bits required to store the data.
To apply the radix conversion technique and aggregate the data, it is necessary to define the bases of the component data:
1. Before being incorporated into the codes, the conversion software strips the user name of all characters other than spaces, letters, and digits. Lower-case characters are converted to upper-case. Each character of the processed user name is encoded as a base-37 value. Only the first seven processed characters of the user name are used (spaces being appended if necessary).
2. The conversion software encodes the module id as a single base-100 value, allowing for the possibility of up to 100 different modules.
3. The conversion scheme defines the activation period as a number of months, having a range of 50 years. As there are 600 months in 50 years, the period is encoded as a single base-600 value.
4. In deference to the UNIX password scheme, the salt value has a 12-bit range (a single base-4096 value).
Ideally, the aggregation process will construct a single numeric value from the registration data. Constructing such a number requires routines for performing arbitrary precision arithmetic (see [3] for a C++ implementation). Such routines are common in math toolkits and are provided as system calls in most flavors of UNIX. In the absence of suitable arithmetic routines, an alternative strategy is to split the various "digits" of this multi-base number across multiple integers, each containing a range of digits from the larger numeric value. That is the strategy employed by my system.
The only caveat in this approach is to ensure that part of the salt value appears in each integer. A 32-bit integer can store five characters of the user name, as 375 is less than 232. The remaining two characters of the user name, the module id, and activation period will fit in a second 32-bit integer. 232/375 rounded down to the nearest whole number is 61, so the first integer can hold 61 variant salt values. Similarly, calculating the number of combinations for the second integer (372 * 100 * 600) and dividing it into the range of an integer yields 52. Multiplying 61 by 52 indicates that this allocation permits a maximum of 3,172 unique salt values (11.6 bits of information). Naturally, my system uses radix conversion to split a single number in the range 0 to 3,171 into two integers with ranges 0 to 60, and 0 to 51.
Encryption Routines
Once the data is aggregated into integers, the encoding software encrypts the integers. In this section I describe the two complementary functions, encrypt and decrypt. Both functions transform integers according to some key. In developing these functions I had many alternatives from which to choose. One naive solution would have been to simply exclusive-OR the data with the key, defining both functions identically. Exclusive-OR is deficient for two reasons: first, it's easy to discern the relationship between unencrypted and encrypted data. More importantly, small changes in the input data yield correspondingly minute changes in the output data. The registration system requires that a change in any of the details that depict a registration instance produces completely different serialization and activation codes. An encryption routine that has exactly this attribute is n * k, where n is the number to be encrypted, k is any odd-numbered key, and the multiplication is carried out in modulus arithmetic that is, overflow is discarded.
In modulus arithmetic, if you multiply a value n by an odd-numbered value k, you can recover n by multiplying the product n * k by a certain third number, f. (Note that you can't get back to n by dividing n * k by k the product is not obtained by ordinary multiplication.) This number f is defined by the Taylor series:
f(k) = 1 - (k - 1)1 + (k - 1)2 - (k - 1)3 + (k - 1)4 . . .In other words, if n is the value to be encrypted and k is any odd-numbered key, then:
n = n * k * f(k)The encryption function is associative and commutative, while the decryption function is neither. These functions have many other useful applications outside of the cryptographic domain.
At first appearance the formula for f(k) seems to be unbounded, since the successive subexpressions increase in magnitude. This would be the case if the entire value of the subexpression was being retained without overflow. However, using integer arithmetic causes overflow bits to be discarded from the calculation of the subexpression and of the subsequent addition or subtraction. If n is odd, then n - 1 is even. The number of zero least-significant bits doubles each time a number is multiplied by itself. Eventually the subexpression will evaluate to zero. At most, it is necessary to calculate only as many subexpressions as the number of bits used to store the result. This Taylor series is encoded in the snippet:
static ulong inverse( ulong i ) { ulong sum = 1; ulong m = 1 - i; if ( i % 2 == 0 ) // no inverse exists // when i is even return 0; do sum += m; while ( m *= 1 - i ); return sum; }I can now define encrypt and decrypt as multiplying the source data by either the key or its inverse:
static ulong encrypt( ulong n, ulong key ) { return n * key; } static ulong decrypt( ulong n, ulong key ) { return n * inverse( key ); }The encoder applies the encryption routine separately to each integer using the same key. After encryption, each encrypted value is converted to text separately and the texts concatenated. The serialization and activation codes will be composed of upper-case letters and digits only. Each character will correspond to a "digit" of a base-36 number. Calculating the base-36 logarithm of a 32-bit number's range indicates that the maximum number of characters this will produce is 7 [4] . The serialization and activation codes have a fixed length of 14 characters.
Putting it All Together
The preceding discussion describes the three components of the encoding process aggregation, encryption, and conversion to text. This section presents the code that ties these components together.
The function regdata2code (Listing 1) encodes a structure containing the user data to a 14-character null-terminated string, while its counterpart, code2regdata (also Listing 1) , converts that same string back to the original structure. The user's distributed software builds a regdata structure containing the registration details and invokes regdata2code using the serialization code key (SCODE_KEY) to translate the local data to the serialization code. The user's software reports this serialization code to the user, who reports it to the developer.
The developer's authorization software first calls code2regdata, also using SCODE_KEY, to convert this code back to the user information. Pending the developer's approval, the developer's code calls
regdata2code, using the activation code key, ACODE_KEY, to re-encode the user information as an activation code. The user enters the activation code into the software, which calls code2regdata (using ACODE_KEY) to produce a second local regdata structure. If the two structures compare equivalently then the user's software activates the module.
Conclusion
Overemphasizing the robustness of various cryptographic algorithms can distract attention from other important protection issues. Once auxiliary forms of protection have been put in place such as use of a salt value, encoding the user name, and applying lower bounds checking to the activation period the encryption routine can provide an adequate level of security even if it is extremely simple.
Those trained in espionage are taught that the most expedient ways to penetrate an organization are to infiltrate it with one's own personnel, purloin its garbage, or tap its phone lines. Breaking cryptographic security to gain access to confidential data is recommended as a last resort. Likewise, most users are unlikely to apply programming tools or mathematical techniques to defeat the system. They are more likely to attempt more indirect approaches for example, sharing activation codes, changing their user name or system date, or reusing old demonstration codes. Developers who act against these indirect approaches have an advantage over developers who myopically focus on their chosen algorithm's theoretical impenetrability. Focusing on these less sophisticated forms of attack not only decreases the likelihood of security transgressions but avoids the cost of expensive public-key encryption routines.
Notes & References
[1] S. Garfinkel. PGP: Pretty Good Privacy. (O'Reilly & Associates, Inc., Sebastopol, CA, 1994).
[2] J. Peek, T. O'Reilly, and M. Loukides. UNIX Power Tools. (Random House, New York, NY, 1994).
[3] Anthony Breitzman, "A Class for Representing Large Integers," C/C++ Users Journal, November 1996.
[4] To calculate a number's logarithm in any base, divide its natural logarithm by the natural logarithm of the base (logx(n) = ln(n)/ln(x)). For example, a 32-bit integer could produce up to ten base -10 digits, as ln(232)/ln(10) > 9.6.
Steve Ball is currently contracted to IBM as a Senior C++/UNIX Developer as part of the development of an advanced Computer Integrated Telephony project. He may be reached at s.ball@xtra.co.nz.