[Chapter 9] 9.4 Vectors and Hashtables

9.4 Vectors and Hashtables

Vectors and hashtables are collection classes. Each stores a group of objects according to a particular retrieval scheme. Aside from that, they are not particularly closely related things. A hashtable is a dictionary; it stores and retrieves objects by a key value. A vector, on the other hand, holds an ordered collection of elements. It's essentially a dynamic array. Both of these, however, have more subtle characteristics in common. First, they are two of the most useful aspects of the core Java distribution. Second, they both take full advantage of Java's dynamic nature at the expense of some of its more static type safety.

If you work with dictionaries or associative arrays in other languages, you should understand how useful these classes are. If you are someone who has worked in C or another static language, you should find collections to be truly magical. They are part of what makes Java powerful and dynamic. Being able to work with lists of objects and make associations between them is an abstraction from the details of the types. It lets you think about the problems at a higher level and saves you from having to reproduce common structures every time you need them.

9.4.1 java.util.Vector

A Vector is a dynamic array; it can grow to accommodate new items. You can also insert and remove elements at arbitrary positions within it. As with other mutable objects in Java, Vector is thread-safe. The Vector class works directly with the type Object, so we can use Vectors with instances of any class.[3] We can even put different kinds of Objects in a Vector together; the Vector doesn't know the difference.

[3] In C++, where classes don't derive from a single Object class that supplies a base type and common methods, the elements of a collection would usually be derived from some common collectable class. This forces the use of multiple inheritance and brings its associated problems.

As you might guess, this is where things get tricky. To do anything useful with an Object after we take it back out of a Vector, we have to cast it back (narrow) it to its original type. This can be done with safety in Java because the cast is checked at run-time. Java throws a ClassCastException if we try to cast an object to the wrong type. However, this need for casting means that your code must remember types or methodically test them with instanceof. That is the price we pay for having a completely dynamic collection class that operates on all types.

You might wonder if you can subclass Vector to produce a class that looks like a Vector, but that works on just one type of element in a type-safe way. Unfortunately, the answer is no. We could override Vector's methods to make a Vector that rejects the wrong type of element at run-time, but this does not provide any new compile-time, static type safety. In C++, templates provide a safe mechanism for parameterizing types by restricting the types of objects used at compile-time. The keyword generic is a reserved word in Java. This means that it's possible that future versions might support C++-style templates, using generic to allow statically checked parameterized types.

We can construct a Vector with default characteristics and add elements to it using addElement() and insertElement():

Vector things = new Vector(); 
 
String one = "one"; 
String two = "two"; 
String three = "three"; 
 
things.addElement( one ); 
things.addElement( three ); 
things.insertElementAt( two, 1 );

things now contains three String objects in the order "one," "two," and "three." We can retrieve objects by their position with elementAt(), firstElement(), and lastElement():

String s1 = (String)things.firstElement();       // "one" 
String s3 = (String)things.lastElement();        // "three" 
String s2 = (String)things.elementAt(1);         // "two"

We have to cast each Object back to a String in order to assign it a String reference. ClassCastException is a type of RuntimeException, so we can neglect to guard for the exception if we are feeling confident about the type we are retrieving. Often, as in this example, you'll just have one type of object in the Vector. If we were unsure about the types of objects we were retrieving, we would want to be prepared to catch the ClassCastException or test the type explicitly with the instanceof operator.

We can search for an item in a Vector with the indexOf() method:

int i = things.indexOf( three );                 // i = 2

indexOf() returns a value of -1 if the object is not found. As a convenience, we can also use contains() simply to test for the presence of the object.

Finally, removeElement() removes a specified Object from the Vector:

things.removeElement( two );

The element formerly at position three now becomes the second element.

The size() method reports the number of objects currently in the Vector. You might think of using this to loop through all elements of a Vector, using elementAt() to get at each element. This works just fine, but there is a more general way to operate on a complete set of elements like those in a Vector.

9.4.2 java.util.Enumeration

The java.util.Enumeration interface can be used by any sort of set to provide serial access to its elements. An object that implements the Enumeration interface presents two methods: nextElement() and hasMoreElements(). nextElement() returns an Object type, so it can be used with any kind of collection. As with taking objects from a Vector, you need to know or determine what the objects are and cast them to the appropriate types before using them.

Enumeration is useful because it is a general interface for any kind of object that wants to provide "one shot" sequential access to its elements. All you have to do is provide a hasMoreElements() test and a nextElement() iterator and declare that your class implements java.util.Enumeration. Another advantage of an Enumeration is that you don't have to provide all values up front; you can provide each value as it's requested with nextElement().

An Enumeration does not guarantee the order in which elements are returned, however, so if order is important you don't want to use an Enumeration. You can iterate through the elements in an Enumeration only once; there is no way to reset it to the beginning or move backwards through the elements.

A Vector returns an Enumeration of its contents when we call the elements() method:

Enumeration e = myVector.elements(); 
 
while ( e.hasMoreElements() ) { 
    String s = (String)e.nextElement(); 
    System.out.println( s ): 
}

The above code loops three times and calls nextElement() to fetch our strings. The actual type of object returned by elements() is a VectorEnumeration, but we don't have to worry about that. We can always refer to an Enumeration simply by its interface.

Note that Vector does not implement the Enumeration interface. If it did, that would put a serious limitation on Vector because we could cycle through the elements in it only once. That's clearly not the purpose of a Vector, which is why Vector instead provides a method that returns an Enumeration.

9.4.3 java.util.Hashtable

As I said earlier, a hashtable is a dictionary, similar to an associative array. A hashtable stores and retrieves elements with key values; they are very useful for things like caches and minimalist databases. When you store a value in a hashtable, you associate a key with that value. When you need to look up the value, the hashtable retrieves it efficiently using the key. The name hashtable itself refers to how the indexing and storage of elements is performed, as we'll discuss shortly. First I want to talk about how to use a hashtable.

The java.util.Hashtable class implements a hashtable that, like Vector, operates on the type Object. A Hashtable stores an element of type Object and associates it with a key, also of type Object. In this way, we can index arbitrary types of elements using arbitrary types as keys. As with Vector, casting is generally required to narrow objects back to their original type after pulling them out of a hashtable.

A Hashtable is quite easy to use. We can use the put() method to store items:

Hashtable dates = new Hashtable(); 
 
dates.put( "christmas", new GregorianCalendar( 1997, Calendar.DECEMBER, 25 ) );
dates.put( "independence", new GregorianCalendar( 1997, Calendar.JULY, 4 ) ); 
dates.put( "groundhog", new GregorianCalendar( 1997, Calendar.FEBRUARY, 2 ) );

First we create a new Hashtable. Then we add three GregorianCalendar objects to the hashtable, using String objects as keys. The key is the first argument to put(); the value is the second. Only one value can be stored per key. If we try to store a second object under a key that already exists in the Hashtable, the old element is booted out and replaced by the new one. The return value of the put() method is normally null, but if the call to put() results in replacing an element, the method instead returns the old stored Object.

We can now use the get() method to retrieve each of the above dates by name, using the String key by which it was indexed:

GregorianCalendar g = (GregorianCalendar)dates.get( "christmas" );

The get() method returns a null value if no element exists for the given key. The cast is required to narrow the returned object back to type GregorianCalendar. I hope you can see the advantage of using a Hashtable over a regular array. Each value is indexed by a key instead of a simple number, so unlike a simple array, we don't have to remember where each GregorianCalendar is stored.

Once we've put a value in a Hashtable, we can take it back out with the remove() method, again using the key to access the value:

dates.remove("christmas");

We can test for the existence of a key with containsKey():

if ( dates.containsKey( "groundhog" ) ) {    // yes

Just like with a Vector, we're dealing with a set of items. Actually, we're dealing with two sets: keys and values. The Hashtable class has two methods, keys() and elements(), for getting at these sets. The keys() method returns an Enumeration of the keys for all of the elements in the Hashtable. We can use this Enumeration to loop through all of the keys:

for (Enumeration e = dates.keys(); e.hasMoreElements(); ) { 
    String key = (String)e.nextElement(); 
    ... 
}

Similarly, elements() provides an Enumeration of the elements themselves.

Hashcodes and key values

If you've used a hashtable before, you've probably guessed that there's more going on behind the scenes than we've let on. An element in a hashtable is not associated with a key strictly by the key's identity, but rather by the key's contents. This allows keys that are equivalent to access the same object. By "equivalent," we mean those objects that compare true with equals(). So, if you store an object in a Hashtable using another object as a key, you can use any object that equals() tells you is equivalent to retrieve the stored object.

It's easy to see why equivalence is important if you remember our discussion of strings. It is easy to create two String objects that have the same text in them but which come from different sources in Java. In this case, the == operator will tell you that the objects are different, but the equals() method of the String class will tell you that they are equivalent. Because they are equivalent, if we store an object in a Hashtable using one of the String objects as a key, we can retrieve it using the other.

Okay, since Hashtables have a notion of equivalent keys, what does the hashcode do? The hashcode is like a fingerprint of the object's data content. The Hashtable uses the hashcode to store the objects so that they can be retrieved efficiently. The hashcode is nothing more than a number (an integer) that is a function of the data. The number always turns out the same for identical data, but the hashing function is intentionally designed to generate as random a number as possible for different combinations of data. That is, a very small change in the data should produce a big difference in the number. It is unlikely that two similar data sets will produce the same hashcode.

A Hashtable really just keeps a number of lists of objects, but it puts objects into the lists based on their hashcode. So when it wants to find the object again, it can look at the hashcode and know immediately how to get to the appropriate list. The Hashtable still might end up with a number of objects to examine, but the list should be short. For each object it finds, it does the following comparison to see if the key matches:

if ((keyHashcode == storedKeyHashcode) && key.equals(storedKey)) 
    return object;

There is no prescribed way to generate hashcodes. The only requirement is that they be somewhat randomly distributed and reproducible (based on the data). This means that two objects that are not the same could end up with the same hashcode by accident. This is unlikely (there are 2³² possible integer values), but it doesn't cause a problem because the Hashtable ultimately checks the actual keys, as well as the hashcodes, to see if they are equal. Therefore, even if two objects have the same hashcode, they can still co-exist in the hashtable.

Here's an interesting example. In Java 1.1, the hashCode() method for the String class does not take into account all characters in the string. Instead, it samples a representative few, in order to save time. Knowing how it works, we can construct two Strings that are obviously not equivalent but nonetheless have the same hashcode. We can show that they can still serve as unique keys in a Hashtable:

String 
   s1 = "xxxxxxxxxxxxxxxx",
   s2 = "x1x2x3x4x5x6x7x8";
System.out.println( s1.hashCode());
System.out.println( s2.hashCode());  // Identical in 1.1

Hashtable h = new Hashtable();
h.put(s1, new Integer(1));
h.put(s2, new Integer(2));
System.out.println( h.get(s1) );     // gets integer 1
System.out.println( h.get(s2) );     // gets integer 2

Hashcodes are computed by an object's hashCode() method, which is inherited from the Object class if it isn't overridden. The default hashCode() method simply assigns each object instance a unique number to be used as a hashcode. If a class does not override this method, each instance of the class will have a unique hashcode. This goes along well with the default implementation of equals() in Object, which only compares objects for identity using ==. You're probably used to overriding equals() in any classes for which equivalence is meaningful. Likewise, if you want equivalent objects to serve as equivalent keys, you need to override the hashCode() method as well to return identical hashcode values. To do this, you need to create some suitably complex and arbitrary function of the contents of your object. The only criterion for the function is that it should be almost certain to return different values for objects with different data, but the same value for objects with identical data.

9.4.4 java.util.Dictionary

java.util.Dictionary is the abstract superclass of Hashtable. It lays out the basic get(), put(), and remove() functionality for dictionary-style collections. You could derive other types of dictionaries from this class. For example, you could implement a dictionary with a different storage format, such as a binary tree.


9.3 Dates		9.5 Properties