Java Solutions


Angelika Langer and Klaus Kreft

Secrets of equals

A wake-up call for Java developers: not all implementations of equals are equal.


Java magazines are full of articles describing the hottest and latest Java novelty to the interested reader. There is, however, little discussion of seemingly trivial core features of the Java programming language itself. Take for instance the equals method. It is a piece of basic infrastructure that is common to all classes. It is defined in class Object, and, as a consequence, all objects are equality comparable. Object.equals provides a default implementation of equality comparison, and, at times, the default implementation is just fine and semantically correct for a newly defined class. In other situations, the new class must override equals to provide meaningful and semantically correct equality comparison.

One would expect that overriding equals, since it is a fairly common task, should be a piece of cake. The reality is far from that. There is an amazing amount of disagreement in the Java community regarding correct implementation of equals. Look at the next best Java source code or open an arbitrary Java textbook and take a look at what you find. Chances are good that you will find several different approaches and a variety of recommendations.

In this article we take a closer look at the different techniques available for implementing equals. Our goal is to evaluate the approaches in order to gain a deeper understanding of the related ups and downs. For purpose of discussion, we arbitrarily picked a number of sample implementations that are accessible because they have been published. They are taken from Program Development in Java by Barbara Liskov with John Guttag [1], Effective Java by Joshua Bloch [2], Practical Java by Peter Haggar [3], and the JDK 1.3 source code [4] (authors: James Gosling, Arthur van Hoff, Alan Liu). Listings 1-4 show the examples.

A first glance at the source code reveals that the respective authors use a variety of techniques:

Obviously there is not just one way of implementing equals. Each of the issues listed above is worth further exploration. In this article, we focus on the available possibilities to perform the necessary type check. Basically there are two approaches: using the instanceof operator or using the getClass method. Before we delve into the details, let us see why we would want to override Object.equals in the first place.

Entity Types vs. Value Types

When I define a new class, do I have to override Object.equals?

To answer this question, we need to understand what the default implementation provided by Object.equals does. It exhibits the same behavior as the == operator, namely a check for identity of two objects. Two objects are identical if they occupy the same memory location and are effectively the “same” object. Identity is different from equality. Two objects are equal when they behave in the same way, which often means that they have the same state or content. Equality is what the equals method is supposed to check for; identity is what operator == (and Object.equals) checks for.

Whether or not a class must override the default behavior of equals (namely the check for identity provided by Object.equals) depends on the semantics of the class. There is a fundamental distinction between value types on the one hand and non-value types, sometimes called entity types, on the other hand:

Entity types normally do not override Object.equals, because there is no point in comparing their content, since it is not overly relevant anyway. In contrast, value types do override Object.equals, because equality means “equal content.” For a value type, there is a significant difference between identity and equality. In other words, only classes that represent value types have a need to override Object.equals. For this reason, the subsequent discussion of implementing equals is only relevant for value types.

The equals Contract

Classes that override Object.equals must provide an implementation that compares for equality of objects. That’s the intended and expected semantic meaning of method equals. In addition, there are a couple of other requirements that an implementation of equals must meet. These requirements stem from the fact that the Java platform libraries use equals in various places, most notably in the hash-based collections such as HashSet and HashMap. These collections have certain expectations regarding the behavior of equals; otherwise they do not work properly. This behavior expected of equals is defined in the so-called equals contract. It is described in the Java platform libraries’ documentation. Here is the contract as stated in the Java 2 Platform, Standard Edition, v1.3.1 API Specification [5], to be looked up under Object.equals:

The equals method implements an equivalence relation:

In plain English, the equals contract requires the following properties:

Consistency with the equals contract is a fundamental requirement to every implementation of equals. Not only do the hash-based collections rely on reflexivity, symmetry, and transitivity, but everybody who calls equals will expect exactly this behavior. Failure to comply to the equals contract leads to subtle bugs that are difficult to track down because they are conceptual problems. When the resulting bugs eventually show up (typically in the production and test phase of a project), the underlying conceptual problem will be extremely hard to identify; conceptual problems are not detected by debugging source code. The recommendation is: never provide a incorrect implementation of equals.

Let us now take a look at the sample implementations in Listings 1-4. Not all of them are correct.

Listing 1: An Incorrect Implementation

Closer examination discloses that the implementation in Listing 1 is not transitive and for that reason incorrect. The example in Listing 1 shows a subclass Point3 derived of a superclass Point2 that provides three versions of equals with three different signatures. Let us see what happens when we use these classes. Consider the following situation (see Figure 1):

Point2 origin(0,0);
Point3 p1(0,0,-1);
point3 p2(0,0,1);

// calls Point3.equals(Point2)
System.out.println(p1.equals(origin));
// calls Point2.equals(Point2)
System.out.println(origin.equals(p2));
// calls Point3.equals(Point3)
System.out.println(p1.equals(p2));

We would expect that the program prints:

true
true
true

Instead the program prints:

true
true
false

Apparently the implementation of equals is not transitive. The lack of transitivity stems from the fact that the different equals methods in Point2 and Point3 perform semantically different comparisons. Let us see how this happens.

Let’s start with the first comparison p1.equals(origin). Method Point3.equals(Point2) is called. Here is the implementation:

public class Point3 extends Point2 {
  ...
  // overriding definition
  public boolean equals(Point2 p) {
    if (p instanceof Point3)
      return equals((Point3)p);
    return super.equals();
  }
  ...
}

Method Point3.equals(Point2) checks whether the other object (namely origin) is a subobject of this object (namely p1), which is false. It is the other way round: p1 is a subobject of origin. The instanceof test yields false, and, in this case, super.equals is called, which is Point2.equals(Point2). Method Point2.equals(Point2) is not shown in Listing 1, but we can safely assume that its implementation follows the same line of logic that is used for the implementation of the equals methods in class Point3. Method Point2.equals(Point2) compares objects of type Point2; in this case, it compares the superclass object origin to the superclass part of the subclass object p1. The result is true. The same happens when origin is compared to p2.

Both comparisons are mixed-type comparisons involving a superclass and a subclass object. The functionality of mixed-type comparison in this sample implementation is a comparison of the part that both objects have in common, namely the superclass. We will refer to this kind of comparison as slice comparison.

Eventually method Point3.equals(Point3) is called to compare p1 and p2. Following the rule of transitivity their comparison by means of equals must yield true. Quite obviously p1 and p2 are not equal and indeed p1.equals(p2) yields false. So, what’s going on here? The transitivity is violated because Point3.equals(Point3) performs a semantically different comparison. It does not compare superclass slices, in which case p1 and p2 would compare equal, because they both contain (0,0). Instead it compares the entire subclass objects including their subclass-specific parts, and p1 and p2 happen to have a different z coordinate. For this reason, the result is false.

equals methods that perform semantically different comparisons in a class hierarchy and allow mixed-type comparison can never by transitive. The problem is not an implementation problem; it is a conceptual problem. Method equals is required to be transitive because of the equals contract. At the same time, it is required to perform a comparison for equality of objects; that’s the natural and intended semantics of equals. Objects of different types in a class hierarchy of value types can have a different structure: typically subclass objects have more fields than superclass objects. In this case, the subclass must provide an overriding version of the equals method that takes the subclass-specific fields into account [6]. Inevitably, the subclass version of equals performs a comparison that is semantically different from its superclass version. If equals permits mixed-type comparison in such a situation, then it can never be transitive. Either the equals method is non-transitive (and incorrect, as in Listing 1), it does not allow slice comparison of superclass and subclass objects (as will be demonstrated in Listing 4), or the equals method is final in the superclass and not overridden in any of the subclasses (similar to Listing 3).

Listing 2: A Common, Yet Debatable Implementation

The example in Listing 2 is taken from the Java platform libraries. It is the example of a non-final class that represents a value type, namely class Date. There is no problem with class Date itself or its implementation of the equals method. Note, however, that neither class Date not its equals method are final. Someone can derive a subclass of Date and override equals. This subclass in conjunction with superclass Date will ask for trouble. Here is a conceivable implementation of a subclass and its equals method:

public class NamedDate extends Date {
  private String name;
  public boolean equals(Object other) {
    if (other instanceof NamedDate && !name.equals
      (((NamedDate)other).name)) return false;
    return super.equals(other));
  }
}

Other implementations of class NamedDate are conceivable, but this would be a typical one [7]. It follows the logic demonstrated by its superclass Date. To remind you, here is the implementation of the superclass version of equals:

public class Date {
  ...
  public boolean equals(Object obj) {
    return obj instanceof Date && 
  getTime() ==
      ((Date) obj).getTime();
  }
}

Both versions of equals perform the necessary type check via the instanceof operator, which means that they both permit mixed-type comparison. Consider the following situation (see Figure 2):

NamedDate EndOfMillenium =
  new NamedDate(99,11,31,
    "end of 2nd millenium AC");
NamedDate TheEnd         =
  new NamedDate(99,11,31,
    "end of the world");
Date      NewYearsEve    =
  new Date(99,11,31);

// slice comparison: true
EndOfMillenium.equals(NewYearsEve)
// slice comparison: true
NewYearsEve.equals(TheEnd)
// whole-object comparison: false
EndOfMillenium.equals(TheEnd)

Again we run into the same transitivity problem as in Listing 1. Date.equals allows for slice comparison. Every overriding version of equals in a subclass performs a semantically different comparison, namely a whole-object comparison that includes the subclass-specific name field, and inevitably it exhibits the incorrect non-transitive behavior. The class Date author could have avoided this pitfall by declaring at least the equals method final, if not the entire class.

Listing 3: A Correct Implementation

Listing 3 shows a final class PhoneNumber. Its implementation of equals is similar to the solution in Listing 2 in that it uses the instanceof test. Since the class is final, no subclasses can ever exist and the transitivity problem discussed above can never come up. This solution is flawless, but restricted to final classes.

Listing 4: Another Correct Implementation

Listing 4 shows yet another approach. Here we have a superclass Golfball and its subclass MyGolfball, which overrides equals because it has a subclass-specific field. The superclass performs the check for type match not using the instanceof operator as in the previous examples, but using the getClass method instead.

Here are the implementations of the two equals methods:

class Golfball {
   ...
   public boolean equals(object obj) {
     if (this == obj)
        return true;
     if (obj!=null && getClass() == 
       obj.getClass())
     {
        // Classes are equal, downcast.
        Golfball gb = (Golfball)obj;
        // Compare atrributes.
        if (brand.equals(gb.brand())  &&  
            make.equals(gb.make()) &&
            compression == 
              gb.compression())
          return true;
     }
     return false;
   }
}
class MyGolfball extends Golfball {
  ...
  public boolean equals(Object obj) {
    if (super.equals(obj))
    {
      // Classes equal, downcast.
      MyGolfball bg = (MyGolfball)obj;
      if (ballConstruction == 
        gb.construction())
        return true;
    }
    return false;
  }
}

The fundamental difference in this class hierarchy is that slice comparison is not permitted. Or, to be more precise, the mixed-type comparison is allowed, but an object of type Golfball never compares equal to an object of type MyGolfball. They are of different types and therefore their Class objects (obtained via getClass) are different. The type check via getClass yields false for all attempts of mixed-type comparison. For example (see Figure 3):

Golfball original =
  new Golfball("xyz", "abc",100);
MyGolfball gb1 =
  new MyGolfball("xyz", "abc", 100, 
    MyGolfball.TwoPiece);
MyGolfball gb2 =
  new MyGolfball("xyz", "abc", 100, 
    MyGolfball.ThreePiece);

gb1.equals(original) // mixed-type 
                     // comparison:
                     // yields false
original.equals(gb2) // mixed-type 
                     // comparison:
                     // yields false
gb1.equals(gb2)      // same-type 
                     // comparison:
                     // yields false

This kind of implementation is correct and fully complies to the equals contract. In class hierarchies of value types, where the equals method must be overridden, it is the only correct approach.

Revisiting Listing 2

Can we use this approach (using getClass instead of instanceof) to solve the problem with the subclass of class Date?

Unfortunately not. The superclass Date permits mixed-type comparison. For sake of symmetry, any subclass must also allow mixed-type comparison. Here is what an incorrect, asymmetric implementation of NamedDate.equals using getClass could look like:

public class Date {  // as before
  ...
  public boolean equals(Object obj) {
    return obj instanceof Date && getTime() ==
      ((Date) obj).getTime();
  }
}
public class NamedDate extends Date {
  private String name;
  public boolean equals(Object other) {
    // getClass() instead of instanceof
    if (other!=null && getClass() == other.getClass()) {
      if (!super.equals(other))  return false;
      return name.equals(((NamedDate)other).name));
    }
  }
  }

NamedDate EndOfMillenium =
  new NamedDate(99,11,31,"end of 2nd millenium AC");
Date      NewYearsEve    =
  new Date(99,11,31);

// slice comparison: false
EndOfMillenium.equals(NewYearsEve)
// slice comparison: true
NewYearsEve.equals(EndOfMillenium)

In general, any subclass of a superclass that uses the instanceof test in its equals implementation cannot switch to the getClass test without violating the symmetry requirement. The same is true for subclasses of superclasses that use the getClass test: the subclass cannot allow mixed-type comparison via instanceof if the superclass disallows it via getClass; otherwise the symmetry requirement would be violated. The fundamental realization is that the strategy chosen for a superclass’s equals method determines the implementation of equals for the entire class hierarchy. All classes in the hierarchy either allow slice comparison and use instanceof, or they disallow it and use getClass.

Conclusions

Having dissected the four arbitrarily chosen examples of equals implementations, what do we conclude?

First of all, there are two substantially different ways of performing the check for type match in an equals implementation. A class can allow mixed-type comparison between superclass and subclass objects by means of the instanceof operator, or a class can treat objects of different types as non-equal by means of the getClass test. The examples above illustrated nicely that implementations of equals using getClass are generally more robust than those implementations using instanceof.

The instanceof test is correct only for final classes or if at least method equals is final in a superclass. The latter essentially implies that no subclass must extend the superclass’s state, but can only add functionality or fields that are irrelevant for the object’s state and behavior, such as transient or static fields.

Implementations using the getClass test, on the other hand, always comply to the equals contract; they are correct and robust. They are, however, semantically very different from implementations that use the instanceof test. Implementations using getClass do not allow comparison of subclass with superclass objects, not even when the subclass does not add any fields and would not even want to override equals. Such a “trivial” class extension for instance would be the addition of a debug-print method in a subclass defined for exactly this “trivial” purpose. If the superclass prohibits mixed-type comparison via the getClass check, then the trivial extension would not be comparable to its superclass. Whether or not this is a problem fully depends on the semantics of the class and the purpose of the extension.

Guidelines for Practitioners

Ultimately, what do we do when we design a class and/or use an existing class as a superclass?

Considering the different approaches and their respective ups and downs, it should by now be clear that none of the solutions is the silver bullet. There is no easy recipe for doing this. A designer of a new class must carefully evaluate the intended and required semantics of the class in order to come to a decision regarding the right implementation of equals (and other pieces of basic infrastructure not discussed in this article). Here are some guidelines:

If the designer of such a non-final class decides in favor of implementing equals using instanceof, then no subclass can ever add fields and override equals without violating the transitivity requirement of the equals contract.

If the designer decides in favor of implementing equals using getClass, then no subclass object will ever be comparable to a superclass object and trivial extensions may not make a lot of sense.

Users of superclasses who wish to extend them should carefully study the superclass’s implementation of equals because it has an impact on the subclass and how it must be implemented and used.

Delegation as an Alternative to Inheritance

Instead of worrying about how to correctly design a superclass or how to implement a subclass of a superclass, why don’t we simply avoid inheritance in the first place?

One way of making life easy is to habitually declare every new class a final class, unless you are sure that it is supposed to be a superclass. In case of defining a superclass, the effort of providing a robust and correct implementation can be substantially higher. More often than not you’ll find that you do not really want or need to go to all the trouble of defining a superclass. Keep it simple; try out a final class before you consider implementing a non-final class.

Another way is to avoid subclassing. In many situations, inheritance can be replaced by delegation. Instead of implementing a new class as a subclass of an existing class, the new class can be implemented as an unrelated class that holds a reference to an object of the existing class type and implements the same methods as the existing class by delegation to the existing class’s method.

As an example, let’s re-implement class NamedDate:

// does not extend Date
public class NamedDate {  
  private String name;
  // Date as a field instead of a superclass
  private Date   date;    
  ...
  public boolean equals(Object other) {
    if (other!=null && getClass()==other.getClass()) {
      if(date.equals
        (((NamedDate)other).date) && 
          name.equals(((NamedDate)other).name))
        return true;
    }
    return false;
  }
  ...
  public boolean before(Date when) {
    return date.before(when);
  }
  public boolean after(Date when) {
    return date.after(when);
  }
  ...
}

This implementation technique often requires some tedious typing, because the entire public interface of the existing class must be repeated (see methods before and after in the example), but it is a sensible and robust alternative to inheritance.

Note its similarity to the getClass approach to equals: in this case NamedDate would not be considered as compatible with Date, and nobody would expect that objects of type Date and type NamedDate would be comparable for equality. If such a comparison were needed, it would be provided in terms of methods such as NamedDate.isSameDateAs(Date) or Date.isSameDateAs(NamedDate), both of which would be final methods that perform a slice comparison. But they would have nothing to do with equals.

As useful as delegation is as an alternative to subclassing, there are situations in which delegation cannot replace inheritance. An example is cases in which protected parts of the superclass must be accessed by the subclass. An example is implementations of the Template Method pattern as described in the GOF book [10]. In these cases, inheritance cannot be avoided and the user must live with the resulting restrictions. Another example is frameworks in which subclassing cannot be avoided because the superclass serves as a marker interface at the same time. An example is the class hierarchy of standard exceptions, most notably classes Throwable, Error, Exception, and RuntimeException. If you need to define a user-defined domain-specific exception type, then you must derive from one of these classes or any subclass thereof. Delegation is not an option here; a class that contained an IndexOutOfBoundsException, instead of being derived from it, would not be an exception type. The moral of the story is: avoid subclassing if you can and try to live with it if you can’t.

Summary

In this article, we looked at implementations of equals, studying various published implementations for illustration of the non-trivial nature of a method that compares objects for equality of content.

One case turned out to be problematic; it is the case of hierarchies of value types in which equals is overridden on various levels of the hierarchy with naturally different semantics. In this case, any mixed-type comparison between subclass and superclass objects fails to be transitive. But transitivity is one of the fundamental pieces of the equals contract.

The implementation detail of relevance in this article is the check for type match that is performed in an implementation of equals. There are two techniques: using the instanceof operator, which allows mixed-type comparison of subclass and superclass objects, and using getClass to make sure that mixed-type comparisons always fail.

equals implementations using getClass are more robust, because they are always correct. equals implementations using instanceof should be final methods; in practice they rarely are declared final, which opens up a notorious pitfall.

This was a snippet of what must be said about equals and other basic infrastructure of types in Java. We haven’t even touched on related issues such as:

Acknowledgements

The ideas in this article were in part inspired by a private email discussion with Joshua Bloch in the summer 2001; we thank Josh for his patience and thoughtful response. Another source of invaluable insight was Mark Weiss, a fellow instructor at DevelopMentor; he sat through our Java course in February 2001 and spotted the non-transitivity of our first naive attempts to get equals right.

Notes and References

[1] Barbara Liskov with John Guttag. Program Development in Java: Abstraction, Specification, and Object-Oriented Design (Addison-Wesley, 2000).

[2] Josh Bloch. Effective Java Programming Language Guide (Addison-Wesley, 2001).

[3] Peter Haggar. Practical Java: Programming Language Guide (Addison-Wesley, 2000).

[4] Java 2 Platform, Standard Edition v1.3.1, <http://java.sun.com/j2se/1.3/>.

[5] Java 2 Platform, Standard Edition, v 1.3.1, API Specification, <http://java.sun.com/j2se/1.3/docs/api/index.html>.

[6] The assumption here is that the additional field contributes to the state of the object. If an irrelevant field is added, then there is no need to override equals in the subclass.

[7] Another common mistake is the violation of the symmetry requirement. For example:

public class NamedDate extends Date {
  private String name;
  public boolean equals(Object other) {
    if ( !(other instanceof NamedDate) )
      return false;
    else if ( !name.equals(((NamedDate)other).name) )
      return false;
    else if ( !super.equals(other) )
      return false;
    else
      return true;
  }
}

If a Date is compared to a NamedDate via Date.equals, then the NamedDate object will pass the instanceof test and the comparison will be performed; the result can be false or true. If we swap the two objects and compare the same NamedDate to the same Date via NamedDate.equals, then the Date object will not pass the instanceof test; these two objects will never compare equal. This behavior violates the symmetry requirement of the equals contract. Unfortunately this kind of implementation was fairly common in Java platform library classes before version 1.3.

[8] It is beyond the scope of this article to go into any details here, but superclass design is hard, much harder than designing a final class. For instance, every use of a non-final method requires that the contract, under which this method is used, is properly defined and documented. A superclass designer cannot assume that the method called does what is expected, if he doesn’t tell anybody what to expect. When the final method is invoked at run time, it might not be the class designer’s method, but an overriding subclass version of it. How does the designer make sure that he knows what this “alien” method does? This is accomplished only by publishing the contract for this method and hoping for the best, namely that subclass implementers will comply with the published contract. The reality is that in practice there are oodles of non-final classes that should actually be final because they have never been properly designed and documented as superclasses.

[9] Sadly, in practice, you find countless examples of non-final classes with non-final equals methods that use the instanceof test, especially in the Java platform libraries. And you will find an equally large number of examples of programmers who step right into the pitfall and provide incorrect subclass implementation, both in real-world Java code as well as in Java publications. Just think of Listing 2. We did not look for a particularly bad book to find an example of an incorrect implementation of equals. The opposite is true. Program Development in Java is an otherwise recommendable book. The fact that is suggests a flawed implementation of equals simply illustrates how easy it is to step into the pitfall of overwriting an implementation of equals that uses the instanceof test without being aware of the resulting problems.

[10] Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1995).

Angelika Langer works as an independent freelance trainer/consultant. Her main area of interest and expertise is object-oriented software development in C++ and Java. She can be reached at langer@camelot.de.

Klaus Kreft is a senior consultant at Siemens Nixdorf in Germany. He can be reached at klaus.kreft@mch.sni.de.