Monday, April 25, 2011

Where do I find a standard Trie based map implementation in Java?

I have a Java program that stores a lot of mappings from Strings to various objects.

Right now, my options are either to rely on hashing (via HashMap) or on binary searches (via TreeMap). I am wondering if there is an efficient and standard trie-based map implementation in a popular and quality collections library?

I've written my own in the past, but I'd rather go with something standard, if available.

Quick clarification: While my question is general, in the current project I am dealing with a lot of data that is indexed by fully-qualified class name or method signature. Thus, there are many shared prefixes.

From stackoverflow
  • Are HashMap and TreeMap that slow?

    We maintain some pretty big maps and speed has never seemed to be that much of an issue.

    Andrew Dashin : TreeMap and HashMap are interfaces.
    : In terms of speed, a hash should be faster than a trie. But I think he's more worried about memory efficiency.
    Esko Luontola : @Andrew Dashin: They are not interfaces. Their respective interfaces are Map and SortedMap.
    Uri : They're interfaces but are usually used with an implementation that conforms to their name.
    Uri : Tries are cheaper space wise since they supposedly don't store entire strings (a lot of my strings have common prefixes)
    Tom : @Uri/Andrew Dashin - they're classes
    mtruesdell : Concrete classes. Not abstract, not interfaces: http://java.sun.com/javase/6/docs/api/java/util/TreeMap.html http://java.sun.com/javase/6/docs/api/java/util/HashMap.html
    Fortyrunner : @uri You now have me interested in Tries as well - thanks! @andrew, They're definitely classes and have been since Java 1.2 when I first started using em!
    Uri : @thaggie: Sorry, my bad, I had an IHashMap and ITreeMap years ago in a C++ project, don't ask. Yes, Map and Tree are the interfaces, but I am using either HashMap or TreeMap and wondering if a Trie wouldn't be better.
    Uri : My main concern with Tries is the cost of accesses. Since each cell may be at a different location in memory.
    Andrew Dashin : My appologies, I did a mistake.
    nos : TreeMap/HashMaps doesn't solve the same problem as tries. You can do prefix lookups in tries.
  • What you need is org.apache.commons.collections.FastTreeMap , I think.

  • You might look at this TopCoder one as well (registration required...).

  • You might want to look at the Trie implementation that Limewire is contributing to the Google collections library.

0 comments:

Post a Comment