Friday, April 29, 2011

Java PriorityQueue removal of arbitrary elements performance

Say I have a java PriorityQueue (which java implements as a heap) that I iterate over to remove elements based on some criteria:

PriorityQueue q = new PriorityQueue();
...
Iterator it = q.iterator();
while(it.hasNext()){
    if( someCriterion(it.next()) )
        it.remove();
}

How long does each remove() operation take? I'm not sure whether it's O(log(n)) or O(1).

From stackoverflow
  • If you're using the Sun implementation, it's O(log(n)). From the Javadocs:

    Implementation note: this implementation provides O(log(n)) time for the enqueing and dequeing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size).

    Other implementations could have different complexity.


    Edit: The Javadocs don't cover the performance of removing an element with an iterator, so I had to look up the source code. This is all relevant to the Sun implementation, and may differ in Apple's version, GNU Classpath, etc. Sun's source is available here; it is also included in the JDK, so you might already have it installed.

    In the PriorityQueue's iterator, the default case for remove() is to call PriorityQueue.removeAt(lastRet), where lastRet is the index that was last returned by next(). removeAt() appears to be O(log(n)) worst case (it might have to sift the queue, but doesn't have to iterate).

    However, sometimes bad things happen. From the comments of removeAt():

    /**
     * Removes the ith element from queue.
     *
     * Normally this method leaves the elements at up to i-1,
     * inclusive, untouched.  Under these circumstances, it returns
     * null.  Occasionally, in order to maintain the heap invariant,
     * it must swap a later element of the list with one earlier than
     * i.  Under these circumstances, this method returns the element
     * that was previously at the end of the list and is now at some
     * position before i. This fact is used by iterator.remove so as to
     * avoid missing traversing elements.
     */
    

    When a non-null element is returned by removeAt(), the iterator adds it to a special queue for later use: when the iterator runs out of elements in the queue, it then iterates through this special queue. When remove() is called during this second phase of iteration, the iterator calls PriorityQueue.removeEq(lastRetElt), where lastRetElt is the last element returned from the special queue. removeEq is forced to use a linear search to find the correct element to remove, which makes it O(n). BUT it can check elements using == rather than .equals(), so its constant factor is lower than PriorityQueue.remove(Object).

    So, in other words, removing with an iterator is technically O(n), but in practice it should be quite a bit faster than remove(Object).

    weiyin : It doesn't specify the remove() operation time from an iterator. Removing the head will definitely take O(log(n)), but what about removing other elements?
    Michael Myers : Good question. Let me look at the source; it might use `remove(Object)`, which would make it linear.
    dfa : remove() uses indexOf(), so it is linear
    weiyin : Thanks for the detailed explanation!

0 comments:

Post a Comment