Add PriorityQueue<T> to Collections #14032

ebickle · 2015-01-30T21:46:20Z

See LATEST Proposal in corefxlab repo.

Second Proposal options

Proposal from https://github.com/dotnet/corefx/issues/574#issuecomment-307971397

Assumptions

Elements in priority queue are unique. If they are not, we would have to introduce 'handles' of items to enable their update/remove. Or the update/remove semantics would have to apply to first/all, which is weird.

Modeled after Queue<T> (MSDN link)

API

public class PriorityQueue<TElement, TPriority>
    : IEnumerable,
    IEnumerable<(TElement element, TPriority priority)>,
    IReadOnlyCollection<(TElement element, TPriority priority)>
    // ICollection not included on purpose
{
    public PriorityQueue();
    public PriorityQueue(IComparer<TPriority> comparer);

    public IComparer<TPriority> Comparer { get; }
    public int Count { get; }
    public bool IsEmpty { get; }

    public bool Contains(TElement element);

    // Peek & Dequeue
    public (TElement element, TPriority priority) Peek(); // Throws if empty
    public (TElement element, TPriority priority) Dequeue(); // Throws if empty
    public bool TryPeek(out TElement element, out TPriority priority); // Returns false if empty
    public bool TryDequeue(out TElement element, out TPriority priority); // Returns false if empty

    // Enqueue & Update
    public void Enqueue(TElement element, TPriority priority); // Throws if it is duplicate
    public void Update(TElement element, TPriority priority); // Throws if element does not exist
    public void EnqueueOrUpdate(TElement element, TPriority priority);
    public bool TryEnqueue(TElement element, TPriority priority); // Returns false if it is duplicate (does NOT update it)
    public bool TryUpdate(TElement element, TPriority priority); // Returns false if element does not exist (does NOT add it)
    
    public void Remove(TElement element); // Throws if element does not exist
    public bool TryRemove(TElement element); // Returns false if element does not exist

    public void Clear();

    public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();

//
// Selector part
//
    public PriorityQueue(Func<TElement, TPriority> prioritySelector);
    public PriorityQueue(Func<TElement, TPriority> prioritySelector, IComparer<TPriority> comparer);

    public Func<TElement, TPriority> PrioritySelector { get; }

    public void Enqueue(TElement element);
    public void Update(TElement element);
}

Open questions:

Class name PriorityQueue vs. Heap
Introduce IHeap and constructor overload? (Should we wait for later?)
Introduce IPriorityQueue? (Should we wait for later - IDictionary example)
Use selector (of priority stored inside the value) or not (5 APIs difference)
Use tuples (TElement element, TPriority priority) vs. KeyValuePair<TPriority, TElement>
- Should Peek and Dequeue rather have out argument instead of tuple?
Is Peek and Dequeue throwing useful at all?

Original Proposal

Issue https://github.com/dotnet/corefx/issues/163 requested the addition of a priority queue to the core .NET collection data structures.

This post, while a duplicate, is intended to act the formal submission to the corefx API Review Process. The issue contents are the speclet for a new System.Collections.Generic.PriorityQueue type.

I will be contributing the PR, if approved.

Rationale and Usage

The .NET Base Class Libraries (BCL) currently lacks support for ordered producer-consumer collections. A common requirement of many software applications is the ability generate a list of items over time and process them in an order different from the order they were received in.

There are three generic data structures within the System.Collections hierarchy of namespaces that supported a sorted collection of items; System.Collections.Generic.SortedList, System.Collections.Generic.SortedSet, and System.Collections.Generic.SortedDictionary.

Of these, SortedSet and SortedDictionary are not appropriate for producer-consumer patterns that generate duplicate values. The complexity of SortedList is Θ(n) worst case for both Add and Remove.

A much more memory and time efficient data structure for ordered collections with producer-consumer usage patterns is a priority queue. Other than when capacity resizing is necessary, worse case insertion (enqueue) and remove top (dequeue) performance is Θ(log n) - far better than the existing options that exist in the BCL.

Priority queues have a wide degree of applicability across different classes of applications. The Wikipedia page on Priority Queues offers a list of many different well understand use cases. While highly specialized implementations may still require custom priority queue implementations, a standard implementation would cover a broad range of usage scenarios. Priority queues are particularly useful in scheduling the output of multiple producers, which is an important pattern in highly parallelized software.

It's worth noting that both the C++ standard library and Java offer priority queue functionality as part of their basic APIs.

Proposed API

namespace System.Collections.Generic
{
    /// <summary>
    /// Represents a collection of objects that are removed in a sorted order.
    /// </summary>
    /// <typeparam name="T">Specifies the type of elements in the queue.</typeparam>    
    [DebuggerDisplay("Count = {count}")]
    [DebuggerTypeProxy(typeof(System_PriorityQueueDebugView<>))]
    public class PriorityQueue<T> : IEnumerable<T>, ICollection, IEnumerable, IReadOnlyCollection<T>
    {
        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that uses a default comparer.
        /// </summary>
        public PriorityQueue();

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that has the specified initial capacity.
        /// </summary>
        /// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
        public PriorityQueue(int capacity);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that uses a specified comparer.
        /// </summary>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
        public PriorityQueue(IComparer<T> comparer);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that contains elements copied from the specified collection and uses a default comparer.
        /// </summary>
        /// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="collection"/> is null.</exception>
        public PriorityQueue(IEnumerable<T> collection);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that contains elements copied from the specified collection and uses a specified comparer.
        /// </summary>
        /// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentNullException">
        /// <paramref name="collection"/> is null. -or-
        /// <paramref name="comparer"/> is null.
        /// </exception>
        public PriorityQueue(IEnumerable<T> collection, IComparer<T> comparer);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class that is empty,
        /// has the specified initial capacity, and uses a specified comparer.
        /// </summary>
        /// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
        public PriorityQueue(int capacity, IComparer<T> comparer);

        /// <summary>
        /// Gets the <see cref="IComparer{T}"/> for the <see cref="PriorityQueue{T}"/>. 
        /// </summary>
        /// <value>
        /// The <see cref="T:System.Collections.Generic.IComparer{T}"/> that is used when
        /// comparing elements in the <see cref="PriorityQueue{T}"/>. 
        /// </value>
        public IComparer<T> Comparer 
        { 
            get;
        }

        /// <summary>
        /// Gets the number of elements contained in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <value>The number of elements contained in the <see cref="PriorityQueue{T}"/>.</value>
        public int Count 
        { 
            get;
        }

        /// <summary>
        /// Adds an object to the into the <see cref="PriorityQueue{T}"/> by its priority.
        /// </summary>
        /// <param name="item">
        /// The object to add to the <see cref="PriorityQueue{T}"/>. 
        /// The value can be null for reference types.
        /// </param>
        public void Enqueue(T item);

        /// <summary>
        /// Removes and returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <returns>The object with the lowest priority that is removed from the <see cref="PriorityQueue{T}"/>.</returns>
        /// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
        public T Dequeue();

        /// <summary>
        /// Returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
        public T Peek();

        /// <summary>
        /// Removes all elements from the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        public void Clear();

        /// <summary>
        /// Determines whether an element is in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <param name="item">
        /// The object to add to the end of the <see cref="PriorityQueue{T}"/>. 
        /// The value can be null for reference types.
        /// </param>
        /// <returns>
        /// true if item is found in the <see cref="PriorityQueue{T}"/>;  otherwise, false.
        /// </returns>
        public bool Contains(T item);

        /// <summary>
        /// Copies the elements of the <see cref="PriorityQueue{T}"/> to an  <see cref="T:System.Array"/>, 
        /// starting at a particular <see cref="T:System.Array"/> index.
        /// </summary>
        /// <param name="array">
        /// The one-dimensional <see cref="T:System.Array">Array</see> that is the
        /// destination of the elements copied from the <see cref="PriorityQueue{T}"/>. 
        /// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
        /// </param>
        /// <param name="arrayIndex">The zero-based index in <paramref name="array"/> at which copying begins.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
        /// <exception cref="T:System.ArgumentOutOfRangeException">
        /// <paramref name="arrayIndex"/> is less than zero. -or- 
        /// <paramref name="arrayIndex"/> is equal to or greater than the length of the <paramref name="array"/>
        /// </exception>
        /// <exception cref="ArgumentException">
        /// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
        /// greater than the available space from <paramref name="index"/> to the end of the destination
        /// <paramref name="array"/>.
        /// </exception>
        public void CopyTo(T[] array, int arrayIndex);

        /// <summary>
        /// Copies the elements of the <see cref="T:System.Collections.ICollection"/> to an 
        /// <see cref="T:System.Array"/>, starting at a particular <see cref="T:System.Array"/> index.
        /// </summary>
        /// <param name="array">
        /// The one-dimensional <see cref="T:System.Array">Array</see> that is the
        /// destination of the elements copied from the <see cref="PriorityQueue{T}"/>. 
        /// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
        /// </param>
        /// <param name="index">The zero-based index in <paramref name="array"/> at which copying begins.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="index"/> is less than zero.</exception>
        /// <exception cref="ArgumentException">
        /// <paramref name="array"/> is multidimensional. -or-
        /// <paramref name="array"/> does not have zero-based indexing. -or-
        /// <paramref name="index"/> is equal to or greater than the length of the <paramref name="array"/> -or- 
        /// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
        /// greater than the available space from <paramref name="index"/> to the end of the destination
        /// <paramref name="array"/>. -or- 
        /// The type of the source <see cref="T:System.Collections.ICollection"/> cannot be cast automatically 
        /// to the type of the destination <paramref name="array"/>.
        /// </exception>
        void ICollection.CopyTo(Array array, int index);

        /// <summary>
        /// Copies the elements stored in the <see cref="PriorityQueue{T}"/> to a new array.
        /// </summary>
        /// <returns>
        /// A new array containing a snapshot of elements copied from the <see cref="PriorityQueue{T}"/>.
        /// </returns>
        public T[] ToArray();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
        /// </summary>
        /// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
        public Enumerator GetEnumerator();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
        /// </summary>
        /// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
        IEnumerator<T> IEnumerable<T>.GetEnumerator();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <returns>An <see cref="T:System.Collections.IEnumerator"/> that can be used to iterate through the collection.</returns>
        IEnumerator IEnumerable.GetEnumerator();

        /// <summary>
        /// Sets the capacity to the actual number of elements in the <see cref="PriorityQueue{T}"/>, 
        /// if that number is less than than a threshold value.
        /// </summary>
        public void TrimExcess();

        /// <summary>
        /// Gets a value that indicates whether access to the <see cref="ICollection"/> is 
        /// synchronized with the SyncRoot.
        /// </summary>
        /// <value>true if access to the <see cref="T:System.Collections.ICollection"/> is synchronized
        /// with the SyncRoot; otherwise, false. For <see cref="PriorityQueue{T}"/>, this property always
        /// returns false.</value>
        bool ICollection.IsSynchronized
        {
            get;
        }

        /// <summary>
        /// Gets an object that can be used to synchronize access to the 
        /// <see cref="T:System.Collections.ICollection"/>.
        /// </summary>
        /// <value>
        /// An object that can be used to synchronize access to the 
        /// <see cref="T:System.Collections.ICollection"/>.
        /// </value>
        object ICollection.SyncRoot
        {
            get;
        }

        public struct Enumerator : IEnumerator<T>
        {
            public T Current { get; }
            object IEnumerator.Current { get; }
            public bool MoveNext();
            public void Reset();
            public void Dispose();
        }
    }
}

Details

Implementation data structure will be a binary heap. Items with a greater comparison value will be returned first. (descending order)
Time complexities:

Operation	Complexity	Notes
Construct	Θ(1)
Construct Using IEnumerable	Θ(n)
Enqueue	Θ(log n)
Dequeue	Θ(log n)
Peek	Θ(1)
Count	Θ(1)
Clear	Θ(N)
Contains	Θ(N)
CopyTo	Θ(N)	Uses Array.Copy, actual complexity may be lower
ToArray	Θ(N)	Uses Array.Copy, actual complexity may be lower
GetEnumerator	Θ(1)
Enumerator.MoveNext	Θ(1)

Additional constructor overloads that take the System.Comparison delegate were intentionally omitted in favor of a simplified API surface area. Callers can use Comparer.Create to convert a function or Lambda expression to an IComparer interface if necessary. This does require the caller to incur a one-time heap allocation.
Although System.Collections.Generic is not yet part of corefx, I propose that this class be added to corefxlab in the meantime. It can be moved to the primary corefx repository once System.Collections.Generic are added and there is consensus that its status should be elevated from experimental to an official API.
An IsEmpty property was not included, since there is no additional performance penalty calling Count. The majority of the collection data structures do not include IsEmpty.
The IsSynchronized and SyncRoot properties of ICollection were implemented explicitly as they are effectively obsolete. This also follows the pattern used for the other System.Collection.Generic data structures.
Dequeue and Peek throw an InvalidOperationException when the queue is empty to match the established behavior of System.Collections.Queue.
IProducerConsumerCollection was not implemented as its documentation states that it is only intended for thread-safe collections.

Open Questions

Is avoiding an additional heap allocation during calls to GetEnumerator when using foreach a strong enough rationale for including the nested public enumerator structure?
Should CopyTo, ToArray, and GetEnumerator return results in prioritized (sorted) order, or the internal order used by the data structure? My assumption is that the internal order should be returned, as it doesn't incur any additional performance penalties. However, this is a potential usability issue if a developer thinks of the class as a "sorted queue" rather a priority queue.
Does adding a type named PriorityQueue to System.Collections.Generic cause a potentially breaking change? The namespace is heavily used, and could cause a source compatibility problem for projects that include their own priority queue type.
Should items be dequeued in ascending or descending order, based on the output of IComparer? (my assumption is ascending order, to match the normal sorting convention of IComparer).
Should the collection be 'stable'? In other words, should two items with equal IComparison results be dequeued in the exact same order they are enqueued in? (my assumption is this isn't needed)

Updates

Fixed complexity of 'Construct Using IEnumerable' to Θ(n). Thanks @svick.
Added another option question regarding whether the priority queue should be ordered in ascending or descending order compared to the IComparer.
Removed NotSupportedException from explicit SyncRoot property to match behavior of other System.Collection.Generic types instead of using the newer pattern.
Made the public GetEnumerator method return a nested Enumerator struct instead of IEnumerable, similar to the existing System.Collections.Generic types. This is an optimization to avoid a heap (GC) allocation when using a foreach loop.
Removed ComVisible attribute.
Changed complexity of Clear to Θ(n). Thanks @mbeidler.

The text was updated successfully, but these errors were encountered:

svick · 2015-01-30T23:16:44Z

Operation Complexity

Construct Using IEnumerable Θ(log n)

I think this should be Θ(n). You need to at least iterate the input.

ghost · 2015-01-31T00:15:14Z

+1

anaisbetts · 2015-01-31T00:15:47Z

Rx has a highly production-tested priority queue class:

https://github.com/Reactive-Extensions/Rx.NET/blob/master/Rx.NET/Source/System.Reactive.Core/Reactive/Internal/PriorityQueue.cs

justinvp · 2015-01-31T20:53:01Z

Should a nested public enumerator structure be used to avoid an additional heap allocation during calls to GetEnumerator and when using foreach? My assumption is no, since enumerating over a queue is an uncommon operation.

I'd lean towards using the struct enumerator to be consistent with Queue<T> which uses a struct enumerator. Also, if a struct enumerator isn't used now, we won't be able to change PriorityQueue<T> to use one in the future.

benaadams · 2015-02-05T00:03:55Z

Also perhaps a method for batch inserts? Can always then sort and continue from previous insertion point rather than starting at the beginning if that would help?:

    public void Enqueue(List<T> items) {
         items.Sort(_comparer);
         ... insertions ...
    }

ebickle · 2015-02-07T04:51:08Z

I've tossed a copy of the initial implementation here. Test coverage is far from complete, but if anyone is curious please take a look and let me know what you think. I've tried to follow the existing coding conventions from the System.Collections classes as much as possible.

justinvp · 2015-02-07T06:50:11Z

Cool. Some initial feedback:

Queue<T>.Enumerator implements IEnumerator.Reset explicitly. Should PriorityQueue<T>.Enumerator do the same?
Queue<T>.Enumerator uses _index == -2 to indicate the enumerator has been disposed. PriorityQueue<T>.Enumerator has the same comment but has an extra _disposed field. Consider getting rid of the extra _disposed field and use _index == -2 to indicate it's been disposed to make the struct smaller and to be consistent with Queue<T>
I think the static _emptyArray field can be removed and usage replaced with Array.Empty<T>() instead.

justinvp · 2015-02-07T07:02:56Z

Also...

Other collections that take a comparer (e.g. Dictionary<TKey, TValue>, HashSet<T>, SortedList<TKey, TValue>, SortedDictionary<TKey, TValue>, SortedSet<T>, etc.) allow null to be passed in for the comparer, in which case Comparer<T>.Default is used.

justinvp · 2015-02-07T07:06:06Z

Also...

ToArray can be optimized by checking for _size == 0 before allocating the new array, in which case just return Array.Empty<T>().

ebickle · 2015-02-07T14:38:44Z

@justinvp Great feedback, thanks!

I implemented Enumerator.Reset explicitly since it's core, non-deprecated functionality of an enumerator. Whether or not it's exposed seems inconsistent across the collection types, and only a few use the explicit variant.
Removed the _disposed field in favor of _index, thanks! Tossed that in at the last minute that night and missed the obvious. Decided to keep the ObjectDisposedException for correctness with the newer collection types, even though the old System.Collections.Generic types don't use it.
Array.Empty is a F# feature, so can't use it here sadly!
Modified the comparer parameters to accept null, good find!
The ToArray optimization is tricky. Technically speaking arrays are mutable in C#, even when they have a length of zero. In reality you're right, the allocation isn't needed and it can be optimized. I'm leaning towards a more cautious implementation, in case there are side-effects I'm not thinking of. Semantically the caller will still expect that allocation, and it's a minor one.

akoeplinger · 2015-02-07T15:46:59Z

@ebickle

Array.Empty is a F# feature, so can't use it here sadly!

Not anymore: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Array.cs#L1060-L1069

ebickle · 2015-02-27T18:47:24Z

I've migrated the code over to ebickle/corefx in the issue-574 branch.

Implemented the Array.Empty() change and plugged everything into the regular build pipeline. One slight temporary kludge I had to introduce was having the System.Collections project depend on the System.Collections nuget package, as Comparer isn't open source yet.

Will be fixed once issue dotnet/corefx#966 is complete.

One key area I'm looking for feedback is how ToArray, CopyTo, and the enumerator should be handled. Currently they're optimized for performance, which means the target array is a heap and not sorted by the comparer.

There are three options:

Leave it as-is, and document that the returned array is not sorted. (it's a priority queue, not a sorted queue)
Modify the methods to sort the returned items/arrays. They will no longer be O(n) operations.
Remove the methods and enumerable support altogether. This is the "purist" option but removes the ability to quickly grab the remaining queued items when the priority queue is no longer needed.

Another thing I'd like feedback on is whether the queue should be stable for two items with the same priority (compare result of 0). Generally priority queues make no guarantees that two items with the same priority will be dequeued in the order they were enqueued, but I noticed that internal priority queue implementation used in System.Reactive.Core took some additional overhead to ensure this property. My preference would be to not do this, but I'm not completely sure which option is better in terms developers' expectations.

mbeidler · 2015-03-22T21:29:18Z

I happened upon this PR because I too was interested in adding a priority queue to .NET. Glad to see someone went to the effort of making this proposal :). After reviewing the code, I noticed the following:

When the IComparer ordering is not consistent with Equals, the behavior of this Contains implementation (which uses the IComparer) may be surprising to some users, as it is essentially contains an item with equal priority.
I didn't see code to shrink the array in Dequeue. Shrinking the heap array by half when quarter full is typical.
Should the Enqueue method accept null arguments?
I think the complexity of Clear should be Θ(n), since that is the complexity of System.Array.Clear, which it uses. https://msdn.microsoft.com/en-us/library/system.array.clear%28v=vs.110%29.aspx

svick · 2015-03-23T13:12:47Z

I didn't see code to shrink the array in Dequeue. Shrinking the heap array by half when quarter full is typical.

Queue<T> and Stack<T> also don't shrink their arrays (based on Reference Source, they're not in CoreFX yet).

ebickle · 2015-03-23T13:58:48Z

@mbeidler I'd considered adding some form of automatic array shrinking in dequeue, but as @svick pointed out it doesn't exist in the reference implementations of similar data structures. I'd be curious to hear from anyone on the .NET Core/BCL team whether there's any particular reason they chose that style of implementation.

Update: I checked List, Queue, Queue, and ArrayList - none of them shrink the size of the internal array on a remove/dequeue.

Enqueue should support nulls, and is documented as allowing them. Did you run into a bug? I can't remember how robust the unit tests area in the area yet.

mbeidler · 2015-04-01T07:40:36Z

Interesting, I noticed in the reference source linked by @svick that Queue<T> has an unused private constant named _ShrinkThreshold. Perhaps that behavior existed in a previous version.

Concerning the use of IComparer instead of Equals in the implementation of Contains, I wrote up the following unit test, which would fail currently: https://gist.github.com/mbeidler/9e9f566ba7356302c57e

ebickle · 2015-04-01T15:44:46Z

@mbeidler Good point. According to MSDN, IComparer / IComparable only guarantees that a value of zero has the same sort order.

However, it looks as though the same issue exists in the other collection classes. If I modify the code to operate on SortedList<Patient, Patient>, the test case still fails on ContainsKey. The implementation of SortedList<TKey, TValue>.ContainsKey calls Array.BinarySearch, which relies on IComparer to check for equality. The same holds true for SortedSet.

Arguably it's a bug in the existing collection classes as well. I'll dug through the rest of the collections classes and see if there are any other data structures that accept an IComparer but test for equality separately. You're right though, for a priority queue you would expect custom ordering behavior that's completely independent from equality.

ebickle · 2015-04-01T17:55:40Z

Committed a fix and the test case into my fork branch. The new implementation of Contains is based directly on the behavior of List.Contains. Since List doesn't accept an IEqualityComparer, the behavior is functionally equivalent.

When I get some time later today, I'll likely submit bug reports for the other built-in collections. The probably can't be fixed due to regression behavior, but at very least the documentation needs to be updated.

mbeidler · 2015-04-03T04:54:04Z

I think it makes sense for ContainsKey to use the IComparer<TKey> implementation, since that is what specifies the key. However, I think it would be more logical for ContainsValue to use Equals instead of IComparable<TValue> in its linear search; though in this case the scope is significantly reduced since a type's natural ordering is much less likely to be inconsistent with equals.

It appears that in the MSDN documentation for SortedList<TKey, TValue>, the remarks section for ContainsValue does indicate that the TValue's sort ordering is used in lieu of equality.

ebickle · 2015-05-10T14:46:30Z

@terrajobst How do you feel about the API proposal so far? Do you feel this a good fit for CoreFX?

ddevault · 2015-07-02T22:24:33Z

👍

terrajobst · 2015-09-11T00:17:48Z

Thanks for filing this. I believe we've enough data to make a formal review of this proposal, hence I labelled it as 'ready for api review'

andrewgmorris · 2015-11-25T21:08:21Z

As Dequeue and Peek are throwing methods the caller needs to check count before each call. Would it make sense to instead (or in addition) provide TryDequeue and TryPeek following the pattern of the concurrent collections? There are issues open to add non-throwing methods to existing generic collections so adding a new collection which doesn't have these methods seems counterproductive.

benaadams · 2015-11-25T21:11:35Z

@andrewgmorris related https://github.com/dotnet/corefx/issues/4316 "Add TryDequeue to Queue"

weshaggard · 2016-08-30T17:58:00Z

We had a basic review of this and we agree that we want a ProrityQueue in the framework. However, we need to get someone to help drive the design and implementation for it. Whoever grabs the issue can work @terrajobst on finalizing the API.

shmuelie · 2016-08-30T19:40:29Z

So what work is left on the API?

justinvp · 2016-08-30T20:00:49Z

This is missing from the API proposal above: PriorityQueue<T> should implement IReadOnlyCollection<T> to match Queue<T> (Queue<T> now implements IReadOnlyCollection<T> as of .NET 4.6).

BrannonKing · 2016-09-30T15:04:42Z

I don't know that array-based priority queues are best. Memory allocation in .NET is really fast. We don't have the same search-small-blocks issue that the old malloc dealt with. You are welcome to use my priority queue code from here: https://github.com/BrannonKing/Kts.Astar/tree/master/Kts.AStar

pgolebiowski · 2020-10-21T15:26:06Z

It would be helpful if you could elaborate more, or just say what you're trying to say. I'd need to disambiguate, there would be ping-pong, and that would turn into a lengthy discussion. Alternatively, we could arrange a call.

stephentoub · 2020-10-21T15:30:32Z

I'm saying we want to avoid any Enqueue operation requiring an allocation, whether on the part of the caller or the part of the implementation (ammortized internal allocation is fine, e.g. to expand an array used in the implementation). I'm trying to understand how that's possible with a node-based heap (e.g. if those node objects are exposed to the caller, that prohibits pooling by the implementation due to concerns around inappropriate reuse / aliasing). I want to be able to write:

pq.Enqueue(42, 84);

and have that not allocate. How do the implementations you refer to achieve that?

or just say what you're trying to say

I thought I was.

pgolebiowski · 2020-10-21T15:48:48Z

we want to avoid any Enqueue operation requiring an allocation [...] I want to be able to write: pq.Enqueue(42, 84); and have that not allocate.

Where does this desire come from? It's a nice to have side effect of a solution, not a requirement that 99.9% customers need to have satisfied. I don't see why you would choose this low-impact dimension to make design choices between solutions.

We're not making design choices based on optimizations for the 0.1% of customers if these negatively impact 12% of the customers in another dimension. "caring about no allocations" + "dealing with two value types" is an edge case.

I find the dimension of supported behavior/functionality far more important, especially when designing a general-purpose versatile data structure for a wide audience and a variety of use cases.

stephentoub · 2020-10-21T15:54:52Z

Where does this desire come from?

From wanting the core collection types to be usable in scenarios that care about performance. You say the node-based solution would support 100% of use cases: that is not the case if every enqueue allocates, just as List<T>, Dictionary<TKey, TValue>, HashSet<T>, and so on would become unusable in many situations if they allocated on every Add.

Why do you believe it's only "0.1%" that care about the allocation overheads of these methods? From where is that data coming?

"caring about no allocations" + "dealing with two value types" is an edge case

It is not. It's also not just "two value types". As I understand it, the proposed solution would require either a) an allocation on every enqueue regardless of the Ts involved, or b) would require the element type to derive from some known base type which in turn prohibits a large number of possible uses to avoid extra allocation.

TimLovellSmith · 2020-10-21T16:06:17Z

@eiriktsarpalis
So you don't forget any options, I think there is a feasible option 4 to add to options 1, 2, and 3, in your list which is a compromise:
4. an implementation that supports the 12% use case, while also nearly-optimizing for the other 88% by allowing updates to elements that are equatable, and only lazily building the lookup table required to do those updates the first time an update method is called (and updating it on subsequence updates+removals). Therefore incurring less cost for apps which don't use the functionality.

We might still decide that because there's extra performance available to the 88% or 12% from an implementation which doesn't need an updatable data structure, or is optimized for one in the first place, its better to provide options 2 and 3, than option 4. But I thought we should not forget another option exists.

[Or I suppose you could just see this as a better option 1 and update the description of 1 to say that bookkeeping is not forced, but lazy, and correct equatable behavior is only required when updates are used...]

pgolebiowski · 2020-10-21T16:07:09Z

@stephentoub That's exactly what I had in mind about saying simply what you want to say, thank you :)

Why do you believe it's only 0.1% that care about the allocation overheads of these methods? From where is that data coming?

From intuition, i.e. the same source based on which you believe it's more important to prioritize "no additional allocations" over "ability to conduct updates". At least for the update mechanism, we have the data that 11-12% customers do need to have this behavior supported. I don't think remotely close customers would care about "no additional allocations".

In either case, you are for some reason choosing to fixate on the memory dimension, forgetting about other dimensions, e.g. raw speed, which is another trade-off for your preferred approach. An array-based implementation providing "no additional allocations" would be slower than a node-based implementation. Again, I think it's arbitrary here to prioritize memory over speed.

Let's take a step back and focus on what the customers want. We have a design choice that may or may not make the data structure unusable for 12% of the customers. I think we would need to be very careful with providing reasons for why we would choose not to support these.

stephentoub · 2020-10-21T16:13:04Z

An array-based implementation providing "no additional allocations" would be slower than a node-based implementation.

Please share the two C# implementations you're using to perform that comparison and the benchmarks used to come to that conclusion. Theoretical papers are certainly valuable but they are only one small piece of the puzzle. The more important thing is when the rubber meets the road, factoring in the details of the given platform and given implementations, and you're able to validate on the specific platform with the specific implementation and typical/expected data sets / usage patterns. It could very well be that your assertion is correct. It also may not be. I'd like to see the implementations / data to understand better.

pgolebiowski · 2020-10-21T16:23:05Z

Please share the two C# implementations you're using to perform that comparison and the benchmarks used to come to that conclusion

This is a valid point, the paper I quote only compares and benchmarks implementations in C++. It conducts multiple benchmarks with different data sets and usage patterns. I am quite confident this would be transferable to C#, but if you believe this is something we need to double down, I think the best course of action would be for you to ask a colleague to conduct such a study.

eiriktsarpalis · 2020-10-21T16:39:34Z

@pgolebiowski I would be interested in better understanding the nature of your objection. The proposal advocates for two separate types, would that not cover your requirements?

eiriktsarpalis · 2020-10-21T17:10:19Z

an implementation that supports the 12% use case, while also nearly-optimizing for the other 88% by allowing updates to elements that are equatable, and only lazily building the lookup table required to do those updates the first time an update method is called (and updating it on subsequence updates+removals). Therefore incurring less cost for apps which don't use the functionality.

I would probably classify that as a performance optimization for option 1, however I do see a couple of issues with that particular approach:

Update now becomes O(n), which can result in unpredictable performance depending on usage patterns.
The lookup table is also needed for validating uniqueness. Enqueueing the same element twice before calling Update would be accepted and arguably bring the queue to an inconsistent state.

TimLovellSmith · 2020-10-21T17:25:06Z

@eiriktsarpalis Its only O(n) once, and O(1) afterwards, which is O(1) amortized. And you can postpone validating uniqueness until the first update. But maybe this is overly clever. Two classes is easier to explain.

eiriktsarpalis · 2020-10-23T18:21:30Z

I've spent the last few days prototyping two PriorityQueue implementations: a basic implementation without update support and an implementation that supports updates using element equality. I've named the former PriorityQueue and the latter, for lack of a better name, PrioritySet. My goal is to gauge API ergonomics and compare performance.

The implementations can be found in this repo. Both classes are implemented using array-based quad heaps. The updatable implementation also uses a dictionary that maps elements to internal heap indices.

Basic PriorityQueue

namespace System.Collections.Generic
{
    public class PriorityQueue<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)>
    {
        // Constructors
        public PriorityQueue();
        public PriorityQueue(int initialCapacity);
        public PriorityQueue(IComparer<TPriority>? comparer);
        public PriorityQueue(int initialCapacity, IComparer<TPriority>? comparer);
        public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values);
        public PriorityQueue(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer);

        // Properties
        public int Count { get; }
        public IComparer<TPriority> Comparer { get; }

        // O(log(n)) push operation
        public void Enqueue(TElement element, TPriority priority);
        // O(1) peek operations
        public TElement Peek();
        public bool TryPeek(out TElement element, out TPriority priority);
        // O(log(n)) pop operations
        public TElement Dequeue();
        public bool TryDequeue(out TElement element, out TPriority priority);
        // Combined push/pop, generally more efficient than sequential Enqueue();Dequeue() calls.
        public TElement EnqueueDequeue(TElement element, TPriority priority);

        public void Clear();

        public Enumerator GetEnumerator();
        public struct Enumerator : IEnumerator<(TElement Element, TPriority Priority)>, IEnumerator;
    }
}

Here's a basic sample using the type

var queue = new PriorityQueue<string, int>();

queue.Enqueue("John", 1940);
queue.Enqueue("Paul", 1942);
queue.Enqueue("George", 1943);
queue.Enqueue("Ringo", 1940);

Assert.Equal("John", queue.Dequeue());
Assert.Equal("Ringo", queue.Dequeue());
Assert.Equal("Paul", queue.Dequeue());
Assert.Equal("George", queue.Dequeue());

Updatable PriorityQueue

namespace System.Collections.Generic
{
    public class PrioritySet<TElement, TPriority> : IReadOnlyCollection<(TElement Element, TPriority Priority)> where TElement : notnull
    {
        // Constructors
        public PrioritySet();
        public PrioritySet(int initialCapacity);
        public PrioritySet(IComparer<TPriority> comparer);
        public PrioritySet(int initialCapacity, IComparer<TPriority>? priorityComparer, IEqualityComparer<TElement>? elementComparer);
        public PrioritySet(IEnumerable<(TElement Element, TPriority Priority)> values);
        public PrioritySet(IEnumerable<(TElement Element, TPriority Priority)> values, IComparer<TPriority>? comparer, IEqualityComparer<TElement>? elementComparer);

        // Members shared with baseline PriorityQueue implementation
        public int Count { get; }
        public IComparer<TPriority> Comparer { get; }
        public void Enqueue(TElement element, TPriority priority);
        public TElement Peek();
        public bool TryPeek(out TElement element, out TPriority priority);
        public TElement Dequeue();
        public bool TryDequeue(out TElement element, out TPriority priority);
        public TElement EnqueueDequeue(TElement element, TPriority priority);

        // Update methods and friends
        public bool Contains(TElement element); // O(1)
        public bool TryRemove(TElement element); // O(log(n))
        public bool TryUpdate(TElement element, TPriority priority); // O(log(n))
        public void EnqueueOrUpdate(TElement element, TPriority priority); // O(log(n))

        public void Clear();
        public Enumerator GetEnumerator();
        public struct Enumerator : IEnumerator<(TElement Element, TPriority Priority)>, IEnumerator;
    }
}

Performance Comparison

I wrote a simple heapsort benchmark that compares the two implementations in their most basic application. I also included a sort benchmark that uses Linq for comparison:

BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04
AMD EPYC 7452, 1 CPU, 4 logical and 2 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT

Method	Size	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
LinqSort	30	1.439 us	0.0072 us	0.0064 us	1.00	0.00	0.0095	-	-	672 B
PriorityQueue	30	1.450 us	0.0085 us	0.0079 us	1.01	0.01	-	-	-	-
PrioritySet	30	2.778 us	0.0217 us	0.0192 us	1.93	0.02	-	-	-	-

LinqSort	300	24.727 us	0.1032 us	0.0915 us	1.00	0.00	0.0305	-	-	3912 B
PriorityQueue	300	29.510 us	0.0995 us	0.0882 us	1.19	0.01	-	-	-	-
PrioritySet	300	47.715 us	0.4455 us	0.4168 us	1.93	0.02	-	-	-	-

LinqSort	3000	412.015 us	1.5495 us	1.3736 us	1.00	0.00	0.4883	-	-	36312 B
PriorityQueue	3000	491.722 us	4.1463 us	3.8785 us	1.19	0.01	-	-	-	-
PrioritySet	3000	677.959 us	3.1996 us	2.4981 us	1.64	0.01	-	-	-	-

LinqSort	30000	5,223.560 us	11.9077 us	9.9434 us	1.00	0.00	93.7500	93.7500	93.7500	360910 B
PriorityQueue	30000	5,688.625 us	53.0746 us	49.6460 us	1.09	0.01	-	-	-	2 B
PrioritySet	30000	8,124.306 us	39.9498 us	37.3691 us	1.55	0.01	-	-	-	4 B

As can be expected, the overhead of tracking element locations adds a significant performance hit, roughly 40-50% slower compared to the baseline implementation.

pgolebiowski · 2020-10-23T22:46:14Z

I appreciate all the effort, I see it took a lot of time and energy.

I really don't see though the reason for 2 almost identical data structures, where one is an inferior version of another.
Further, even if you would want to have such 2 versions of a priority queue, I don't see how the "superior" version is better than Priority queue proposal (v2.2) from 20 days ago.

miyu · 2020-10-25T21:30:13Z

tl;dr:

This proposal is exciting!! But it doesn't yet fit my high-perf use-cases.
>90% of my computational geometry/motion planning performance load is PQ enqueue/dequeue because that's the dominant N^m LogN in an algorithm.
I'm in favor of separate PQ implementations. I usually don't need priority updates, and 2x worse perf is unacceptable.
PrioritySet is a confusing name and isn't discoverable vs PriorityQueue
Storing priority twice (once in my element, once in the queue) feels expensive. Struct copies & memory usage.
If computing priority were expensive, I would simply store a tuple (priority: ComputePriority(element), element) into a PQ, and my priority-getter function would simply be tuple => tuple.priority.
Performance should be benchmarked per-operation or alternatively on real-world use-cases (e.g. optimized multi-start multi-end search on a graph)
Unordered enumeration behavior is unexpected. Prefer Queue-like Dequeue()-order semantics?
Consider supporting Clone operation & merge operation.
Basic operations must be 0-alloc in steady-state usage. I will pool these queues.
Consider supporting EnqueueMany which performs heapify to aid pooling.

I work in high-performance search (motion-planning) & computational geometry code (e.g. sweepline algorithms) that is relevant to both robotics and games, I use a lot of custom hand-rolled priority queues. The other common use-case I have is a Top-K query where updatable priority is not useful.

Some feedback regarding the two-implementations (yes vs no update support) debate.

Naming:

PrioritySet implies set semantics, but the queue doesn't implement ISet.
You're using the UpdatablePriorityQueue name which is easier to discover if I search PriorityQueue.

Performance:

Priority Queue performance is nearly always my performance bottleneck (>90%) in my geometry / planning algorithms
Consider passing a Func<TElement, TPriority> or Comparison to ctor rather than copying TPriority around (expensive!). If computing priority is expensive, I will insert (priority, element) into a PQ and pass a comparison which looks at my cached priority.
A significant number of my algorithms do not need PQ updates. I'd consider using a built-in PQ that does not support updates, but if something has a 2x perf cost to support a feature I don't need (updating) then it is useless to me.
For performance analysis / tradeoffs, it would be important to know the relative wall-time cost of enqueue/dequeue per operation @eiriktsarpalis -- "tracking is 2x slower" isn't enough to evaluate whether a PQ is useful.
I was glad to see your constructors perform Heapify. Consider a constructor which takes an IList, List, or Array to avoid enumeration allocations.
Consider exposing an EnqueueMany which performs Heapify if the PQ is initially empty, since in high-perf it is common to pool collections.
Consider making Clear not zero array if elements do not contain references.
Allocations in enqueue/dequeue are unacceptable. My algorithms are zero-alloc for performance reasons, with pooled thread-local collections.

APIs:

Cloning a priority queue is a trivial operation with your implementations and frequently useful.
- Related: Should enumerating a Priority Queue have queue-like semantics? I would expect a dequeue-order collection, similar to what Queue does. I expect new List(myPriorityQueue) to not mutate the priorityQueue, but to work as I just described.
As mentioned above, it is preferable to take in a Func<TElement, TPriority> rather than insert with priority. If computing priority is expensive, I can simply insert (priority, element) and provide a func tuple => tuple.priority
Merging two priority queues is sometimes useful.
It is strange that Peek returns a TItem but Enumeration & Enqueue have (TItem, TPriority).

That being said, for a significant number of my algorithms, my priority queue items contain their priorities, and storing that twice (once in the PQ, once in the items) sounds inefficient. This is especially the case if I am ordering by numerous keys (similar use-case to OrderBy.ThenBy.ThenBy). This API also cleans a lot of the inconsistencies where Insert takes a priority but Peek doesn't return it.

Finally, it is worth noting that I often insert indices of an array into a Priority Queue, rather than array elements themselves. This is supported by all APIs discussed so far, though. As an example, if I am processing the beginning/ends of intervals on a x-axis line, I might have priority queue events (x, isStartElseEnd, intervalId) and order by x then by isStartElseEnd. This is often because I have other data structures that map from an index to some computed data.

eiriktsarpalis · 2020-10-26T15:52:28Z

@pgolebiowski I took the liberty of including your proposed pairing heap implementation to the benchmarks, just so we can compare instances of all three approaches directly. Here are the results:

Method	Size	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
PriorityQueue	10	774.7 ns	3.30 ns	3.08 ns	773.2 ns	-	-	-	-
PrioritySet	10	1,643.0 ns	3.89 ns	3.45 ns	1,642.8 ns	-	-	-	-
PairingHeap	10	1,660.2 ns	14.11 ns	12.51 ns	1,657.2 ns	0.0134	-	-	960 B
PriorityQueue	50	6,413.0 ns	14.95 ns	13.99 ns	6,409.5 ns	-	-	-	-
PrioritySet	50	12,193.1 ns	35.41 ns	29.57 ns	12,188.3 ns	-	-	-	-
PairingHeap	50	13,955.8 ns	193.36 ns	180.87 ns	13,989.2 ns	0.0610	-	-	4800 B
PriorityQueue	150	27,402.5 ns	76.52 ns	71.58 ns	27,410.2 ns	-	-	-	-
PrioritySet	150	48,485.8 ns	160.22 ns	149.87 ns	48,476.3 ns	-	-	-	-
PairingHeap	150	56,951.2 ns	190.52 ns	168.89 ns	56,953.6 ns	0.1831	-	-	14400 B
PriorityQueue	500	124,933.7 ns	429.20 ns	380.48 ns	124,824.4 ns	-	-	-	-
PrioritySet	500	206,310.0 ns	433.97 ns	338.81 ns	206,319.0 ns	-	-	-	-
PairingHeap	500	229,423.9 ns	3,213.33 ns	2,848.53 ns	230,398.7 ns	0.4883	-	-	48000 B
PriorityQueue	1000	284,481.8 ns	475.91 ns	445.16 ns	284,445.6 ns	-	-	-	-
PrioritySet	1000	454,989.4 ns	3,712.11 ns	3,472.31 ns	455,354.0 ns	-	-	-	-
PairingHeap	1000	459,049.3 ns	1,706.28 ns	1,424.82 ns	459,364.9 ns	0.9766	-	-	96000 B
PriorityQueue	10000	3,788,802.4 ns	11,715.81 ns	10,958.98 ns	3,787,811.9 ns	-	-	-	1 B
PrioritySet	10000	5,963,100.4 ns	26,669.04 ns	22,269.86 ns	5,950,915.5 ns	-	-	-	2 B
PairingHeap	10000	6,789,719.0 ns	134,453.01 ns	265,397.13 ns	6,918,392.9 ns	7.8125	-	-	960002 B
PriorityQueue	1000000	595,059,170.7 ns	4,001,349.38 ns	3,547,092.00 ns	595,716,610.5 ns	-	-	-	4376 B
PrioritySet	1000000	1,592,037,780.9 ns	13,925,896.05 ns	12,344,944.12 ns	1,591,051,886.5 ns	-	-	-	288 B
PairingHeap	1000000	1,858,670,560.7 ns	36,405,433.20 ns	59,815,170.76 ns	1,838,721,629.0 ns	1000.0000	-	-	96000376 B

Key takeaways

The pairing heap implementation asymptotically performs much better than its array-backed counterparts. However it can be up to 2x slower for small heap sizes (< 50 elements), catches up at around 1000 elements, but is up to 2x faster for heaps of size 10^6.
As expected, the pairing heap produces a significant amount of heap allocations.
The "PrioritySet" implementation is consistently slow ~~the slowest of all three contenders~~, so we might not want to pursue that approach after all.

~~In light of the above I still believe there are valid trade-offs between the baseline array heap and the pairing heap approach~~.

EDIT: updated the results after a bugfix in my benchmarks, thanks @VisualMelon

VisualMelon · 2020-10-26T15:55:13Z

@eiriktsarpalis Your benchmark for the PairingHeap is, I think, wrong: the parameters to Add are the wrong way around. When you swap them around, it's a different story: https://gist.github.com/VisualMelon/00885fe50f7ab0f4ae5cd1307312109f

(I did exactly the same thing when I first implemented it)

Note that this doesn't mean the pairing heap is faster or slower, rather it seems to depend heavily on the distribution/order of the data supplied.

TimLovellSmith · 2020-10-26T16:28:18Z

@eiriktsarpalis re: the usefulness of PrioritySet...
We shouldn't expect updatable to be anything other than slower for heapsort, since it has no priority updates in the scenario. (Also for heapsort, its likely you even want to keep duplicates, a set is just not appropriate.)

The litmus test for seeing whether PrioritySet is useful should be benchmarking algorithms which use priority updates, versus a non-updating implementation of same algorithm, enqueueing the duplicate values and ignoring duplicates when dequeueing.

eiriktsarpalis · 2020-10-26T16:38:51Z

Thanks @VisualMelon, I updated my results and comments after your suggested fix.

rather it seems to depend heavily on the distribution/order of the data supplied.

I believe it might have benefited from the fact that the enqueued priorities were monotonic.

The litmus test for seeing whether PrioritySet is useful should be benchmarking algorithms which use priority updates, versus a non-updating implementation of same algorithm, enqueueing the duplicate values and ignoring duplicates when dequeueing.

@TimLovellSmith my goal here was to measure performance for the most common PriorityQueue application: rather than measure the performance of updates, I wanted to see the impact for the case where updates are not needed at all. It might however make sense to produce a separate benchmark that compares pairing heap with "PrioritySet" updates.

@miyu thanks for your detailed feedback, it is very much appreciated!

eiriktsarpalis · 2020-10-26T17:38:09Z

@TimLovellSmith I wrote a simple benchmark that uses updates:

Method	Size	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
PrioritySet	10	1.052 us	0.0106 us	0.0099 us	1.055 us	-	-	-	-
PairingHeap	10	1.055 us	0.0042 us	0.0035 us	1.055 us	0.0057	-	-	480 B
PrioritySet	50	7.394 us	0.0527 us	0.0493 us	7.380 us	-	-	-	-
PairingHeap	50	8.587 us	0.1678 us	0.1570 us	8.634 us	0.0305	-	-	2400 B
PrioritySet	150	27.522 us	0.0459 us	0.0359 us	27.523 us	-	-	-	-
PairingHeap	150	32.045 us	0.1076 us	0.1007 us	32.019 us	0.0610	-	-	7200 B
PrioritySet	500	109.097 us	0.6548 us	0.6125 us	109.162 us	-	-	-	-
PairingHeap	500	131.647 us	0.5401 us	0.4510 us	131.588 us	0.2441	-	-	24000 B
PrioritySet	1000	238.184 us	1.0282 us	0.9618 us	238.457 us	-	-	-	-
PairingHeap	1000	293.236 us	0.9396 us	0.8789 us	293.257 us	0.4883	-	-	48000 B
PrioritySet	10000	3,035.982 us	12.2952 us	10.8994 us	3,036.985 us	-	-	-	1 B
PairingHeap	10000	3,388.685 us	16.0675 us	38.1861 us	3,374.565 us	-	-	-	480002 B
PrioritySet	1000000	841,406.888 us	16,788.4775 us	15,703.9522 us	840,888.389 us	-	-	-	288 B
PairingHeap	1000000	989,966.501 us	19,722.6687 us	30,705.8191 us	996,075.410 us	-	-	-	48000448 B

ericsampson · 2020-10-26T21:24:18Z

On a separate note, has their been discussion/feedback on the lack of stability being an issue (or non-issue) for people's usecases?

eiriktsarpalis · 2020-10-27T10:35:41Z

has their been discussion/feedback on the lack of stability being an issue (or non-issue) for people's usecase

None of the implementations guarantee stability, however it should be pretty straightforward for users to obtain stability by augmenting the ordinal with a timestamp:

var pq = new PriorityQueue<string, (int priority, DateTime timestamp)>();

foreach (string element in elements)
{
    int priority = 42;
    pq.Enqueue(element, (priority, DateTime.Now));
}

eiriktsarpalis · 2020-10-28T18:30:04Z

To summarize some of my previous posts, I've been trying to identify what a popular .NET priority queue would look like, so I went through the following data:

Common priority queue usage patterns in .NET source code.
PriorityQueue implementations in core libraries of competing frameworks.
Benchmarks of various .NET priority queue prototypes.

Which yielded the following takeaways:

90% of priority queue use cases do not require priority updates.
Supporting priority updates results in a more complicated API contract (requiring either handles or element uniqueness).
In my benchmarks, implementations that support priority updates are 2-3x slower compared to ones that don't.

Next Steps

Going forward, I propose we take the following actions for .NET 6:

Introduce a System.Collections.Generic.PriorityQueue class that is simple, caters to the majority of our user requirements and is as efficient as possible. It'll use an array-backed quaternary heap and will not support priority updates. A prototype of the implementation can be found here. I will be creating a separate issue detailing the API proposal shortly.
We recognize the need for heaps that support efficient priority updates, so will keep working towards introducing a specialized class that addresses this requirement. We are evaluating a couple of prototypes [1,2] each with their own sets of trade-offs. My recommendation would be to introduce this type at a later stage, since more work is needed to finalize the design.

At this moment, I would like to thank the contributors to this thread, in particular @pgolebiowski and @TimLovellSmith. Your feedback has played a huge role in guiding our design process. I hope to keep receiving your input as we iron out the design for the updatable priority queue.

pgolebiowski · 2020-10-31T13:22:22Z

Going forward, I propose we take the following actions for .NET 6: [...]

Sounds good :)

Introduce a System.Collections.Generic.PriorityQueue class that is simple, caters to the majority of our user requirements and is as efficient as possible. It'll use an array-backed quaternary heap and will not support priority updates.

If we have the decision by the codebase owners that this direction is approved and desired, can I continue leading the API design for that bit and provide the final implementation?

At this moment, I would like to thank the contributors to this thread, in particular @pgolebiowski and @TimLovellSmith. Your feedback has played a huge role in guiding our design process. I hope to keep receiving your input as we iron out the design for the updatable priority queue.

It was quite a journey :D

eiriktsarpalis · 2020-11-18T13:46:25Z

The API for System.Collections.Generic.PriorityQueue<TElement, TPriority> has just been approved. I've created a separate issue to continue our conversation on a potential heap implementation that supports priority updates.

I'm going to close this issue, thank you all for your contributions!

xied75 · 2020-11-18T16:06:21Z

Maybe someone can do a write up on this journey! A whole 6 years for one API. :) Any chance to win a Guinness?

joshfree assigned terrajobst Feb 6, 2015

karelz unassigned terrajobst Sep 26, 2016

eiriktsarpalis mentioned this issue Oct 28, 2020

Developers can use a built in PriorityQueue type #43957

Closed

4 tasks

eiriktsarpalis mentioned this issue Nov 18, 2020

Add a priority queue that supports priority updates #44871

Open

eiriktsarpalis closed this as completed Nov 18, 2020

ghost locked as resolved and limited conversation to collaborators Jan 7, 2021

Add PriorityQueue<T> to Collections #14032

Add PriorityQueue<T> to Collections #14032

Comments

ebickle commented Jan 30, 2015

Second Proposal options

Assumptions

API

Original Proposal

Rationale and Usage

Proposed API

Details

Open Questions

Updates

svick commented Jan 30, 2015

ghost commented Jan 31, 2015

anaisbetts commented Jan 31, 2015

justinvp commented Jan 31, 2015

benaadams commented Feb 5, 2015

ebickle commented Feb 7, 2015

justinvp commented Feb 7, 2015

justinvp commented Feb 7, 2015

justinvp commented Feb 7, 2015

ebickle commented Feb 7, 2015

akoeplinger commented Feb 7, 2015

ebickle commented Feb 27, 2015

mbeidler commented Mar 22, 2015

svick commented Mar 23, 2015

ebickle commented Mar 23, 2015

mbeidler commented Apr 1, 2015

ebickle commented Apr 1, 2015

ebickle commented Apr 1, 2015

mbeidler commented Apr 3, 2015

ebickle commented May 10, 2015

ddevault commented Jul 2, 2015

terrajobst commented Sep 11, 2015

andrewgmorris commented Nov 25, 2015

benaadams commented Nov 25, 2015

weshaggard commented Aug 30, 2016

shmuelie commented Aug 30, 2016

justinvp commented Aug 30, 2016

BrannonKing commented Sep 30, 2016

pgolebiowski commented Oct 21, 2020 • edited Loading

stephentoub commented Oct 21, 2020 • edited Loading

pgolebiowski commented Oct 21, 2020 • edited Loading

stephentoub commented Oct 21, 2020 • edited Loading

TimLovellSmith commented Oct 21, 2020 • edited Loading

pgolebiowski commented Oct 21, 2020 • edited Loading

stephentoub commented Oct 21, 2020 • edited Loading

pgolebiowski commented Oct 21, 2020

eiriktsarpalis commented Oct 21, 2020

eiriktsarpalis commented Oct 21, 2020

TimLovellSmith commented Oct 21, 2020

eiriktsarpalis commented Oct 23, 2020 • edited Loading

Basic PriorityQueue

Updatable PriorityQueue

Performance Comparison

pgolebiowski commented Oct 23, 2020

miyu commented Oct 25, 2020 • edited Loading

eiriktsarpalis commented Oct 26, 2020 • edited Loading

Key takeaways

VisualMelon commented Oct 26, 2020 • edited Loading

TimLovellSmith commented Oct 26, 2020

eiriktsarpalis commented Oct 26, 2020 • edited Loading

eiriktsarpalis commented Oct 26, 2020

ericsampson commented Oct 26, 2020

eiriktsarpalis commented Oct 27, 2020 • edited Loading

eiriktsarpalis commented Oct 28, 2020 • edited Loading

Next Steps

pgolebiowski commented Oct 31, 2020

eiriktsarpalis commented Nov 18, 2020

xied75 commented Nov 18, 2020

pgolebiowski commented Oct 21, 2020 •

edited

Loading

stephentoub commented Oct 21, 2020 •

edited

Loading

pgolebiowski commented Oct 21, 2020 •

edited

Loading

stephentoub commented Oct 21, 2020 •

edited

Loading

TimLovellSmith commented Oct 21, 2020 •

edited

Loading

pgolebiowski commented Oct 21, 2020 •

edited

Loading

stephentoub commented Oct 21, 2020 •

edited

Loading

eiriktsarpalis commented Oct 23, 2020 •

edited

Loading

miyu commented Oct 25, 2020 •

edited

Loading

eiriktsarpalis commented Oct 26, 2020 •

edited

Loading

VisualMelon commented Oct 26, 2020 •

edited

Loading

eiriktsarpalis commented Oct 26, 2020 •

edited

Loading

eiriktsarpalis commented Oct 27, 2020 •

edited

Loading

eiriktsarpalis commented Oct 28, 2020 •

edited

Loading