-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a generic topological sort algorithm #25169
Implement a generic topological sort algorithm #25169
Conversation
using System.Linq; | ||
using System.Text; | ||
using System.Threading.Tasks; | ||
using Xunit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: sort #Closed
for (int t = 0; t < length - 1; t++) | ||
{ | ||
var tmp = data[t]; | ||
int r = random.Next(t, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, forgot to finish shuffle... #Closed
@@ -0,0 +1,204 @@ | |||
using System; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing copyright #Closed
toCount.AddRange(nodes); | ||
while (toCount.Count != 0) | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will delete blank line #Closed
public void TestRandom(int seed) | ||
{ | ||
int numberOfNodes = 100; | ||
Random random = new Random(seed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure Random works the same between platforms/versions etc? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked the docs. It seems to be specified as Donald E. Knuth's subtractive random number generator
#Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is required to by the specification of this method:
Providing an identical seed value to different Random objects causes each instance to produce identical sequences of random numbers. This is often done when testing apps that rely on random number generators.
Having said that, it doesn't matter very much. I doubt there exists an implementation and seed that would cause the cyclic part of the test to fail, which is the only part that can fail spuriously if a random number generator produces a very unlucky sequence. But since it is possible, I added plenty of documentation in the tests so a future failure could be diagnosed.
#Closed
|
||
// Initialize the ready set with those nodes that have no predecessors | ||
var ready = ArrayBuilder<TNode>.GetInstance(); | ||
foreach ((TNode node, int count) in predecessorCounts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we dependent on the order of keys in a hashtable?
A completely stable sort WRT nodes/successors is probably an overkill, but I wonder if we could be getting different results when sorting something like symbols and whether that is ok or we never do that.
#Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. The sort should be deterministic. I will build a deterministically-ordered list of nodes and use that here. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out in #23391 (comment), the order of Dictionary<TKey, TValue>
is deterministic as long as items are only added to it and never removed. It is unclear if that guarantee applies to dictionaries that are cleared (via Clear()
). #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashes of the stuff we are going to put into the dictionary is not deterministic (bound nodes, symbols..)
In many cases the hash would be, or derived from, object hashes. That can easily change between builds. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VSadov Dictionary<TKey, TValue>
enumeration order equals the original insertion order until an item is removed and then another item is added, where the hashes have no impact on this rule. As I mentioned in the comment, this appears to be an undocumented behavior of the collection, though I do not believe it will change. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like just going through buckets. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We replace items in the dictionary here. I don't want to rely on Dictionary order in any case.
In reply to: 171730132 [](ancestors = 171730132)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is the Dictionary does not have any facility to remember the insertion order. It simply enumerates the internal bucket array where items are stored based on their truncated key hash values or whether there were collisions.
In our case it is fairly nondeterministic.
/// <param name="nodes">Any subset of the nodes that includes all nodes with no predecessors</param> | ||
/// <param name="successors">A function mapping a node to its set of successors</param> | ||
/// <returns>A list of all reachable nodes, in which each node always precedes its successors</returns> | ||
public static ImmutableArray<TNode> IterativeSort<TNode>(IEnumerable<TNode> nodes, Func<TNode, IEnumerable<TNode>> successors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps want an overload with ImmutableArrays as inputs, but waiting until the actual use also seems ok. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
var sorted = TopologicalSort.IterativeSort<int>(new[] { 4, 5 }, x => successors[x]); | ||
AssertTopologicallySorted(sorted, successors, "Test01"); | ||
Assert.Equal(6, sorted.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do AssertEx.Equals(new[] {4, 5, 2, 3, 1, 0}, sorted);
#Closed
}; | ||
|
||
var sorted = TopologicalSort.IterativeSort<int>(new[] { 4, 5 }, x => successors[x]); | ||
AssertTopologicallySorted(sorted, successors, "Test01"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Passing the name of the test is odd. Isn't the stacktrace on assertion sufficient? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for this test it is. For others it isn't as there is a seed parameter. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
// At this point all the predecessor counts should be zero, otherwise there was a cycle | ||
foreach ((TNode node, int count) in predecessorCounts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ Does (_, int count)
work? We use var (_, count)
in the IDE layer but I'm not sure how that translates to explicit type situations.
❓ Why not check that predecessorCounts.Count == resultBuilder.Count
? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I need to get out of the habit of enumerating dictionaries in any case, as that is nondeterministic.
In reply to: 171706953 [](ancestors = 171706953)
{ | ||
var node = ready.Pop(); | ||
resultBuilder.Add(node); | ||
foreach (var succ in successors(node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I worry a bit about the number of IEnumerator<T>
allocations required to process the list (potentially 2*n for a graph with n nodes) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We visit each node and edge a constant number of times. We call successors(node) once to count predecessors, and then we call it once on each node to process the outgoing edges. Presuming each call allocates a new enumerator, that is 2n enumerators. If that is a problem for some caller the caller can cache. However, there is no getting around the fact that the caller must have this data allocated in memory in some form or another.
In reply to: 171726056 [](ancestors = 171726056)
|
||
int[] randomSubset(int count, int limit) | ||
{ | ||
// We don't worry about duplicate values. That's all part of the test, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I'm not sure this is correct. If you intend to support duplicate values in the result of successors
, you should add a dedicated test to that effect.
Specifically, it seems that duplicate values in successors
will bypass the de-duplication logic and result in a value claiming to have more predecessors than it actually does. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predecessor count is a count of incoming edges, not preceding nodes. I'll add a test.
In reply to: 171728301 [](ancestors = 171728301)
@sharwell I believe all of your comments have been addressed. Any other comments? |
The implementation of pattern-matching will require nodes of the decision dag be generated and processed iteratively using a worklist-based algorithm, as a recursive implementation would overflow the compiler's execution stack for a large switch statement. To facilitate those algorithms, the nodes of the dag will have to be topologically sorted. There are topological sort algorithms in a few places in Roslyn, but they all use Tarjan's recursive algorithm. That works for things like the class inheritance hierarchy or the dependencies between projects because those graphs are of moderate size. However, a large switch statement in C# would produce a decision dag large enough that a recursive algorithm would overflow the execution stack of the compiler. This PR implements a generic topological sort algorithm that is not recursive. It needs to be generic because it will be used both in the production of the decision dag (in which it works with abstract representations of the decision state) and in processing the decision dag itself.
@cston Can you review this, please?
@dotnet/roslyn-compiler Which of you have your college algorithms class still fresh in your mind?