Wednesday, July 01, 2009

TreeSet and implementing the Comparator interface

Time for a bit more technical content. You might begin to discern a pattern here: gotchas in equals(), hashcode() and Java Collections.

My experience of Collections thus far has taught me that most of them are built upon detecting object equality through the equals() and hashcode() implementation in your class. See my earlier post and the posts of Andy Beacock referenced therein. However, today I ran into problems with objects vanishing in a TreeSet.

Back to the requirements first. I have list of Bank Account objects I wish to show on a web-form, but that list should also be modifiable by the user with an Add Form, and a Remove button on each listed account. So far so good. One more thing, the list has got to be in account name order on display. For this example, a Bank Account consists of a pseudo-primary key (id), a Bank Name, an Account Name, an Account Number, and a Sort Code. The id is not shown, and the records are unique by Account Number and Sort Code (and therefore implicitly by Bank Name).

Obviously, Bank Names should should not really be store in a Bank Account record - they should be stored separately and referenced by foreign key, as indeed, should Sort Codes - in fact, in an ideal world, the Bank Account record would be an id, a name, an account number and a foreign key to a branch as identified by the sort code. However, you know as well as I do that this is not an ideal world, and most of the time, we have to deal with the world as we come to it.

So, back to my slightly contrived example. I'm going to ignore completely all the contextual information about display technology, form interaction, transactions, etc., and concentrate on the core issue: SortedSets are nasty wee buggers!

There are two ways to manage the display of sorted data: sort on insert, and sort on display. If you're displaying often, and inserting & deleting infrequently (as in this example), sort on insert is preferable, otherwise you spend a lot of time sorting your data again and again with no change to the data.

Having looked at the implementations of the SortedSet interface, we really only have one available in the Java SDK (I'm staying away from the Apache Collections for now), and that is TreeSet, for which you specify a Comparator at construction time. Naively, I assumed that my comparator would just be interested in the Account Name - that's the field by which I wish to sort, after all.

    @Override
    public int compare(BankAccount o1, BankAccount o2) {
        return o1.getAccountName().compareTo(o2.getAccountName());
    }


(I'm leaving out the normal defensive programming and exception handling for clarity)

However, my BankAccount class implements equals() and hashcode() as normal, with the sortcode and the account number - these being the 'business key' if you like. So, on this basis, I expected TreeSet to sort on name, and use equals() and/or hashcode() to determine whether the Set contains a given BankAccount on insertion - in the same way that HashSet would.

Ooops, no.

Turns out that TreeSet doesn't use equals() or hashcode() at all. It uses the Comparator for sorting and contains() checking. So now, my Comparator implementation looks like this:

@Override
public int compare(BankAccount o1, BankAccount o2) {
   int result = o1.getAccountName().compareTo(o2.getAccountName());
   if (result == 0) {
       result = o1.getSortCode().compareTo(o2.getSortCode());
   }
   if (result == 0) {
       result = o1.getAccountNumber().compareTo(o2.getAccountNumber());
   }
   return result;
}

Now I have data sorted on insert and display, and the data doesn't get lost if someone enters two accounts with the same Account Name.

In conclusion, I think that the only reason this gotcha got me was that I was motivated to use the TreeSet simply by the requirement to sort by name, so I wrote the Comparator with only that in mind. A correct implementation of really should include the business keys of the entity under consideration... of course, I'd just done that, I might have had even more trouble with the sort-by-name requirement (i.e. a non-key property).

No comments:

Post a Comment