Dr. Dobb's Journal July 1999
Digital certificates provide an important element of security -- that of trust. One of the most popular standards specifying the contents of a digital certificate is X.509, published by the International Telecommunications Union (ITU). In this article, I'll describe the elements of an X.509 certificate, show how certificates are encoded, and present software you can use to decode and display them in a readable form. This software was originally developed using Squeak Smalltalk (which lends itself to rapid development), then converted to Java.
A certificate is a document, issued by a trusted agent, stating that the public key of the person named in the document has a certain value. The concept of employing the services of a trusted third party is not new -- when you have a document notarized by a notary public, you do exactly that. The recipient of a notarized document trusts the stamp of the notary public and interprets its presence as proof that the person presenting the document signed it in the presence of the notary public.
A digital certificate loosely parallels a notarized document. The role of the notary public is assumed by a Certificate Authority (CA) who employs a digital signature rather than a stamp. The CA generates this signature by first computing a message digest of the certificate contents, then encrypting this message digest using the CA's private key. The recipient of the certificate decrypts the signature using the CA's public key and the algorithm the CA used to perform the encryption, computes a message digest of the certificate contents using the same algorithm used by the CA, and performs a comparison between the computed message digest and the one accompanying the certificate. A match means that the contents of the certificate have not been tampered with, and that the certificate was indeed signed by the CA. Also, based on the trust placed in the CA, the public key presented in the certificate really belongs to the person named in the certificate.
You are most likely to encounter certificates when you use a web browser. Figure 1 shows one of my digital certificates installed in Netscape Communicator.
The X.509 standard specifies a certificate using Abstract Syntax Notation (ASN.1), a language used to describe data types in such a manner as to eliminate ties to any particular platform. Here, I discuss only the subset of ASN.1 that is required to understand a certificate. For more information on ASN.1, see the text box entitled "The Abstract Syntax Notation," by Steve Witten, which accompanied the article "Packet Filtering in the SNMP Remote Monitor," by William Stallings (DDJ, November 1994).
Figure 2 shows the ASN.1 representation of a certificate as set forth in the Internet Draft entitled Internet X.509 Public Key Infrastructure -- Certificate and CRL Profile produced by the PKIX working group of the Internet Engineering Task Force (IETF). The latest version of this and several other useful documents related to certificates can be obtained by following the "Internet-Drafts" hypertext link on the IETF site at http://www.ietf.org/home.html.
Take a look at the following ASN.1 statement found in Figure 2:
CertificateSerialNumber ::= INTEGER.
This defines the ASN.1 type CertificateSerialNumber as having the value INTEGER. CertificateSerialNumber is a simple type; it has no components. Figure 3 shows some simple types and their universal tag numbers.
Now look at the ASN.1 statement:
Certificate ::= SEQUENCE {
tbsCertificate TBSCertificate,
signatureAlgorithm AlgorithmIdentifier,
signatureValue BIT STRING }
The ASN.1 type Certificate is a SEQUENCE of the components tbsCertificate, signatureAlgorithm, and signatureValue; this ASN.1 type is called "structured." Figure 4 shows the four structured types defined by ASN.1, their universal tag numbers, and their meaning. Each of the components within a structured type appears as an identifier followed by its type.
The definition of the certificate just presented is abstract and only becomes useful when each of the ASN.1 values is converted to a series of 0s and 1s and saved. The original rules for performing this operation were the Basic Encoding Rules (BER) as set forth in ITU-T Recommendation X.209. I will not discuss BER; however, if you study the recommendation, you will see that there is often more than one way to BER-encode an ASN.1 object. The X.509 specification eliminates possible ambiguity by using the Distinguished Encoding Rules (DER), a subset of BER that gives a unique encoding for any ASN.1 object. As with ASN.1, I will not discuss DER in its entirety, but just the amount required to understand how a certificate is encoded.
DER is a tag/length/value encoding system in which each ASN.1 value is represented as a series of octets, where an octet is an 8-bit unsigned integer. The bits are numbered, with bit 8 as the most significant and bit 1 as the least significant.
The first octet, called the "identifier octet," is subdivided into three fields (see Figure 5). You use these fields to derive the tag mentioned earlier. Every ASN.1 value has a tag that consists of a class and a nonnegative tag number. To overcome the limitation imposed by the fact that the largest number that can be represented using 5 bits is 25-1 (31), two forms of tag are used. The first, or low-tag-number form, is for tag numbers between 0 and 30. The second, or high-tag-number form, is for tag numbers greater than 30. In this form, bits 5 through 1 are all set to 1, indicating that the actual tag number is contained in one or more of the octets that follow. The tag number is represented as base 128, with bit 8 of each octet except the last set to 1. The order is from most significant to least significant.
The identifier octet (or the last tag byte in the case of the high form of a tag) is followed by the length octet, which indicates how many of the following octets should be interpreted as contents octets. As is the case with tag, there are two forms of length. The short form consists of a single octet and is used when the length is between 0 and 127. The long form uses 2 to 127 octets. For the short form, bit 8 is 0 and the length is contained in bits 7 through 1. For the long form, bit 8 of the first octet is 1 and bits 7 through 1 represent the number of subsequent octets from which the length is to be derived: These octets represent the length base 256. The order is from most significant to least significant.
The last length octet is followed by the number of contents octets indicated by the length octet; these represent the actual value. Unlike the tag and length, which can be represented in two ways, the contents can be represented in several ways.
Figure 6 shows the encoded portion of a sample certificate; it is one of many which you can find in Eric Young's SSLeay package (ftp://ftp.psy.uq.oz.au/pub/Crypto/SSL). In Figure 7 you can see that same certificate viewed using my Squeak Smalltalk X.509 Certificate Browser. The code is available electronically as X509.st, which is suitable for fileIn (make sure you are using Squeak 2.3 or later). In the Squeak System Browser in Figure 8, the second column of the top pane shows all of the classes in the X509 Category, which is highlighted in the first column. Notice how the class names correspond to the ASN.1 values in Figure 2.
X509CertificateLister.java (Listing One) is a Java port of the Squeak Smalltalk certificate viewer without the graphical interface. I'll use this code to discuss how you decode a certificate. Figure 9 shows the output generated by X509CertificateLister. You can see the content is identical to that of Figure 7.
X509CertificateLister creates an instance of X509Certificate and invokes the format() method on it. In X509Certificate.java (available electronically; see "Resource Center," page 5), the constructor of class X509Certificate invokes the readCertificate() method, which reads a file containing the certificate. The format of the file (Figure 6) is similar to that used by Privacy Enhanced Mail (PEM); the base 64 encoded form of the certificate is sandwiched between the delimiters "----BEGIN CERTIFICATE----" and "----END CERTIFICATE." The readCertificate() method discards everything before and after these delimiters and uses the class method decode() of Base64 (available electronically) to decode the contents into a byte array that is stored in the instance variable derCertificate. The parse() method decomposes derCertificate into ASN.1 values. Those structured values are further decomposed.
Before examining the parse() method, you should first examine ASN1Value.java (available electronically). The instance variables class, tag, and length correspond to the components of an ASN.1 value I discussed in DER encoding. The instance variable constructed is a Boolean that has a value of True if the ASN.1 value is a constructed type and False if it is a primitive type. The instance-variable data holds the contents octets that, as you will see, can be retrieved in a variety of ways or parsed into further ASN1Value objects. The remaining instance variable, totalLength, is a helper that represents the total number of bytes consumed from derCertificate to create this ASN1Value.
If you look at the parse() method of X509Certificate, you can see that the first code it executes is:
certificateSequence = new
ASN1Value(derCertificate, 0);
To see how this instance of ASN1Value is created, look at the constructor for ASN1Value. It receives two arguments; the first is a byte array from which it is to extract and decode the data comprising the ASN.1 value; the second is the position in the array at which it is to begin. It starts by setting the variable b to the value of that byte of the array specified in the second argument. You can see from Figure 10 that b now has a value of 0x30 (I'm using byte and octet interchangeably). It then performs a logical AND of b and ASN1Constants.MASK_CLASS, which has a value of 0xC0; this extracts bits 8 and 7 which, as you can see from Figure 5, represents the class. In the present case, the class is universal. It next performs a logical AND of b and ASN1Constants.MASK_ BIT6 and, if the result is not zero, sets the instance variable constructed to True; otherwise, constructed retains its initial value of False. Again, referring to Figure 5, you can see that bit 6 is the bit used to indicate whether the ASN.1 value is constructed. Bit 6 in the present case has a value of 1, so the instance variable constructed is True. The constructor code now performs a logical AND of b and ASN1Constants.BITS_5_1 to isolate what Figure 5 shows as being the tag. If the result is equal to ASN1Constants.BITS_5_1 (that is, all bits are 1), method decodeHighTag() is invoked to extract the tag using additional octets. In the present case, the tag is in low-tag-number form and has a value of 0x10, which Figure 4 shows is the universal tag for SEQUENCE. The code next sets p to the start position it received in argument two plus the number of bytes that make up the tag (in this case, 1; in the case of a high-tag-number form of the tag, the value returned by decodeHighTag()). From Figure 10, you can see that p is now pointing to position 1 in derCertificate; this is the length octet, which has a value of 0x82. Remember that there are two ways of encoding length as indicated by bit 8. In the present case, bit 8 has a value of 1, indicating that this is the long form; bits 7 through 1 have a value of 2, indicating that the length is contained in the two octets that follow. This code decodes the length:
length = 0;
for (int j = 0; j < (((int)b) & ASN1Constants.BITS_7_1); ++j) {
++p;
length = (length << 8) | ((array[p] & ASN1Constants.BITS_8_1));
}
The logical AND with ASN1Constants .BITS_8_1 is performed because Java propagates the high-order bit of a byte to the left.
You can see that length is tested for a value of 0; this is a special case that indicates a form of encoding, known as "indefinite-length encoding," is being used, in which case the end of this ASN.1 value's data is marked by two consecutive zero-value octets. There has been much debate as to whether indefinite-length encoding belongs in certificates and such discussion is beyond the scope of this article.
After the length has been computed, p is incremented to point to the contents octets that are extracted and stored in the instance-variable data.
Now that you have seen how an instance of ASN1Value is created, the remainder of the certificate decoding is easy. The parse() method of X509Certificate creates three instances of ASN1Value. The first two are passed to the constructors of X509TBSCertificate and X509AlgorithmIdentifier. The third is saved in the instance variable signature. The constructors of X509TBSCertificate and X509AlgorithmIdentifier parse the ASN1Value objects they receive as arguments into other ASN1Value objects that are either saved in instance variables or passed as arguments to the constructors of other classes. This process continues to the level of nesting of structured ASN.1 values in the certificate.
The first ASN.1 value extracted by the parse() method is an instance of X509TBSCertificate (available electronically). This ASN.1 value is a SEQUENCE of ASN.1 types representing those items "to be signed" by the CA; hence "TBS." Since you have now seen the parsing mechanism, I will just point out some of the TBS elements.
I will start with SerialNumber because, as you will see, Version is a slightly unusual case. SerialNumber (at 0x000D) is an INTEGER. An ASN.1 INTEGER can be positive, negative, or zero and can have any magnitude. The contents octets give the value, base 256, in two's complement form from most significant to least significant. The minimum number of octets is used, so you will find no leading zeros. The value 0 is encoded as a single octet having a value of 0. If you examine the getInteger() method of ASN1Value (available electronically), you can see how SerialNumber is decoded. For this certificate, the serial number is 1. Each certificate issued by a CA must have a unique serial number.
Version may or may not be present in a certificate. If the version is absent, the certificate is assumed to be Version 1. To present, there are three possible versions of X.509 certificates. The presence of Version is indicated by an identifier octet with a value of 0xA0, which decodes as a class of context specific, a type of constructed and a tag of 0. At 0x0008, you can see just such an identifier octet. The definition of Version in Figure 2 reads:
version [0] EXPLICIT Version DEFAULT v1
where Version is defined as:
Version ::= INTEGER { v1(0), v2(1), v3(2) }
The word EXPLICIT refers to a type of tagging called "explicit tagging." Tagging provides a way to differentiate between different context-specific types. An explicitly tagged type is a type derived from another type by adding an outer tag to the underlying type. In this case, the underlying type is the INTEGER found at 0x000A. When present, the value of the INTEGER is always one less than the actual version number. As you can see from the single content octet at 0x000C, this is a Version 3 certificate.
Signature (at 0x0010) is an AlgorithmIdentifier type. From the ASN.1 definition in Figure 2, you can see that AlgorithmIdentifier (available electronically; see X509AlgorithmIdentifier.java) is a SEQUENCE in which the first field (at 0x0012) is an OBJECT IDENTIFIER whose universal tag is 6. An OBJECT IDENTIFIER (OID) is a series of integers that identifies some kind of resource. Examples of resources that can be represented by an OID include a registration authority that itself assigns OIDs, a cryptographic algorithm, or a directory name. The HashtableoidMap in ASN1Constants.java (available electronically) shows a number of OIDs and their translations. The integer components of an OID are separated by a "." and are organized hierarchically in much the same manner as Internet domain names (for instance, all 1.n identifiers are controlled by 1 and all 1.2.n identifiers are controlled by 1.2, and so on). You can see how an OID is decoded if you look at method getOID() in ASN1Value .java (available electronically). The first number of an OID always has a value of 0, 1, or 2. If it is 0 or 1, the second number is limited to a range of 0 through 39. These limitations allow the first two numbers to be encoded in the first contents octet as (val1×40)+val2. All OIDs must consist of at least two numbers. The remaining numbers of the OID are encoded base 128 using the least possible number of digits. For each series of octets used to represent a number within the OID, all but the last have the most significant bit set to 1.
You would decode the signature (at 0x0010) as follows: The first two bytes identify the ASN.1 value as having a class of universal, a type of constructed, a tag of 0x10 (SEQUENCE), and a length of 13. The first ASN.1 value in the SEQUENCE starts at 0x0012 and is, as you can see from the first two bytes, an OID with a length of 9 octets. The first octet represents the first two numbers 1.2 (1×40+2). The next number in the OID is encoded in the bytes located at 0x0014 and 0x0015 (the byte at 0x0015 terminates the sequence because its most significant bit is 0). These two bytes represent the decimal number 840, represented in base 128. The bytes at 0x0016 through 0x0018 similarly represent the decimal number 113549. Continuing until all nine bytes of the OID have been decoded, you finish with 1.2.840.113549 .1.1.4 which, if you check oidMap, represents md5WithRSAEncryption.
If you wish to fully explore the hierarchy of OIDs, see http://www.alvestrand .no/~hta/objectid/, which contains a link that lets you "walk" the entire OID tree. If you do, you would see that the complete translation of the OID you decoded above is:
1 - ISO assigned
2 - ISO member body
840 - US
113549 - RSADSI
1 - PKCS
1 - PKCS-1
4 - MD5 with RSA encryption
The second ASN.1 value in the SEQUENCE is the universal type NULL encoded in the two bytes at 0x001D and 0x001E. This ASN.1 value represents the algorithm parameters -- in this case, none.
Issuer (at 0x001F) and subject (at 0x009D) identify the party who signed and issued the certificate and the party to whom the certificate was issued, respectively. Both fields are the ASN.1 representations of an X.501 name. Referring to Figure 1, you can see that if you follow the definition of Name down through the levels of nesting, it ultimately consists of a RelativeDistinguishedName. If you have used or encountered X.500, you will recognize the term "relative distinguished name." Figure 2 defines RelativeDistinguishedName as a SET of AttributeTypeAndValues where each AttributeTypeAndValue is a SEQUENCE of an OID followed by a string. If you decode these OIDs using the technique I explained earlier, you will notice they are all of the form 2.5.4.n. Walking the OID tree gives:
2 - Joint ISO/ITU-T assignment
5 - Directory (X.500)
4 - Attributes
The fourth number in the OID is one of the following (and this is only a partial set):
3 - Common Name (CN)
6 - Country Name (C)
7 - Locality Name (L)
8 - State/Province Name (S)
10 - Organization (O)
11 - Organizational Unit (OU)
The identifiers in parentheses are the ones you have more likely seen in distinguished names. X509Name.java is available electronically.
validity (at 0x007D) is a SEQUENCE of two dates. The first (at 0x0080) is the date before which the CA will not vouch for the validity of certificate; the second (at 0x008F) is the date after which the certificate is to be considered invalid. The contents octets of each date contain the date in the form YYMMDDHHMMSSZ. If the two-digit-year field is greater than 50, you interpret it as 19YY; otherwise, you interpret it as 20YY. Dates later than 2050 must use the ASN.1 value GeneralizedTime, which represents the date as YYYYMMDDHHMMSS. In X509Validity .java (available electronically), the dateFromString() method uses the parse() method of the SimpleDateFormat class to extract a UTCTime.
subjectPublicKeyInfo (at 0x010A) contains an algorithm (at 0x010C) and a key to be used with the algorithm (at 0x0118). The algorithm is encoded as an OID that you have already seen, and the key is encoded as a BIT STRING; see X509SubjectPublicKeyInfo, available electronically.
Extensions (at 0x0165) can exist only for Version 3 certificates. Extensions provide a mechanism for associating additional attributes with users or public keys. The identifier octet at 0x0165 decodes as a class of context specific, a type of constructed, and a tag of 3. As you saw with Version, Extensions is an explicitly tagged type. The underlying type is the SEQUENCE at 0x0169. The length of this SEQUENCE is the long form of length since bit 8 of the octet at 0x016A is 1. The length, as derived from the two bytes that follow, is 404. Each element of the SEQUENCE consists of an OID and an associated ASN.1 structure. I only report the OIDs; see X509Extension, available electronically.
The second ASN.1 value extracted by the parse() method of X509Certificate is the signature algorithm (at 0x0301). This identifies the cryptographic algorithm the CA used to sign the certificate. The algorithm is represented as an OID.
The third ASN.1 value extracted by parse() is the digital signature (at 0x0310), which you can see has a class of universal and a type of BIT STRING.
Now that you have examined software that decodes an X.509 certificate, you can see that working with one is not terribly difficult. I should point out, however, that you will encounter certificates the software cannot decode. This unfortunate situation results from the fact that software used to encode certificates takes its rules from a profile rather than from the specification. A specification leaves room for interpretation, and this interpretation is done by a profile. A profile is a specification with an attitude. You can get an idea of the variety of profiles in use and the quirks of each from the document X.509 Style Guide by Peter Gutmann at http://www.cs.auckland.ac.nz/~pgut001/ pubs/x509guide.txt.
DDJ
import com.beechwood.certificates.*;
import java.io.*;
public class X509CertificateLister {
public X509CertificateLister(String fileName) {
try {
format(new X509Certificate(fileName));
}
catch (X509CertificateException e) {
System.out.println(e.getMessage());
System.exit(0);
}
}
private void format(X509Certificate cert)
throws X509CertificateException {
X509TBSCertificate tbsCertificate = cert.getTBSCertificate();
System.out.print("X509 Certificate Version: " +
tbsCertificate.version());
System.out.println(" Serial Number: " + tbsCertificate.serialNumber());
System.out.println();
System.out.print("Issuer: ");
System.out.println(tbsCertificate.issuer().getRDN());
System.out.println();
System.out.print("Not valid before ");
System.out.println(tbsCertificate.validity().notBeforeDateString());
System.out.print("Not valid after ");
System.out.println(tbsCertificate.validity().notAfterDateString());
System.out.println();
System.out.print("Subject: ");
System.out.println(tbsCertificate.subject().getRDN());
System.out.println();
System.out.print("Subject Public Key Algorithm: ");
X509AlgorithmIdentifier algorithm =
tbsCertificate.subjectPublicKeyInfo().algorithm();
System.out.print(algorithm.getOID());
System.out.println(" (" +algorithm.getOIDDescription() + ")");
System.out.println();
System.out.println("Public Key:");
String subjectPublicKey = tbsCertificate.subjectPublicKey();
int ix = 0;
while ((ix + 48) < subjectPublicKey.length()) {
System.out.println(" " + subjectPublicKey.substring(ix, ix + 48));
ix += 48;
}
if (ix < subjectPublicKey.length())
System.out.println(" " + subjectPublicKey.substring(ix));
System.out.println();
X509Extension[] extensions = tbsCertificate.getExtensions();
if (extensions.length > 0) {
System.out.println("Extensions:");
for (int i = 0; i < extensions.length; ++i) {
System.out.print(" " + extensions[i].id());
System.out.print( " (" + extensions[i].idDescription() + ")");
if (extensions[i].isCritical())
System.out.print(" **CRITICAL**");
System.out.println();
}
}
System.out.println();
System.out.print("Signature algorithm: ");
System.out.print(cert.getSignatureAlgorithm().getOID());
System.out.println(" (" + cert.getSignatureAlgorithm().
getOIDDescription() + ")");
System.out.println();
System.out.println("Signature:");
String signature = cert.getSignature();
ix = 0;
while ((ix + 48) < signature.length()) {
System.out.println(" " + signature.substring(ix, ix + 48));
ix += 48;
}
if (ix < signature.length())
System.out.println(" " + signature.substring(ix));
System.out.println();
}
public static void main(String[] args) {
if (args.length != 1) {
System.out.println("Usage: X509CertificateLister certFile");
System.exit(0);
}
X509CertificateLister lister = new X509CertificateLister(args[0]);
System.exit(0);
}
}