Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a generic topological sort algorithm #25169

Merged
merged 4 commits into from
Mar 2, 2018

Conversation

gafter
Copy link
Member

@gafter gafter commented Mar 1, 2018

The implementation of pattern-matching will require nodes of the decision dag be generated and processed iteratively using a worklist-based algorithm, as a recursive implementation would overflow the compiler's execution stack for a large switch statement. To facilitate those algorithms, the nodes of the dag will have to be topologically sorted. There are topological sort algorithms in a few places in Roslyn, but they all use Tarjan's recursive algorithm. That works for things like the class inheritance hierarchy or the dependencies between projects because those graphs are of moderate size. However, a large switch statement in C# would produce a decision dag large enough that a recursive algorithm would overflow the execution stack of the compiler. This PR implements a generic topological sort algorithm that is not recursive. It needs to be generic because it will be used both in the production of the decision dag (in which it works with abstract representations of the decision state) and in processing the decision dag itself.

@cston Can you review this, please?
@dotnet/roslyn-compiler Which of you have your college algorithms class still fresh in your mind?

@gafter gafter added this to the 16.0 milestone Mar 1, 2018
@gafter gafter self-assigned this Mar 1, 2018
@gafter gafter requested a review from a team March 1, 2018 21:35
@gafter gafter requested a review from a team as a code owner March 1, 2018 21:35
@gafter gafter changed the base branch from master to features/recursive-patterns March 1, 2018 21:35
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Xunit;
Copy link
Member

@jcouv jcouv Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: sort #Closed

for (int t = 0; t < length - 1; t++)
{
var tmp = data[t];
int r = random.Next(t, length);
Copy link
Member Author

@gafter gafter Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot to finish shuffle... #Closed

@@ -0,0 +1,204 @@
using System;
Copy link
Member Author

@gafter gafter Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright #Closed

toCount.AddRange(nodes);
while (toCount.Count != 0)
{

Copy link
Member Author

@gafter gafter Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will delete blank line #Closed

@sharwell sharwell self-requested a review March 1, 2018 22:03
public void TestRandom(int seed)
{
int numberOfNodes = 100;
Random random = new Random(seed);
Copy link
Member

@VSadov VSadov Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure Random works the same between platforms/versions etc? #Closed

Copy link
Member

@VSadov VSadov Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked the docs. It seems to be specified as Donald E. Knuth's subtractive random number generator #Closed

Copy link
Member Author

@gafter gafter Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is required to by the specification of this method:

Providing an identical seed value to different Random objects causes each instance to produce identical sequences of random numbers. This is often done when testing apps that rely on random number generators.

Having said that, it doesn't matter very much. I doubt there exists an implementation and seed that would cause the cyclic part of the test to fail, which is the only part that can fail spuriously if a random number generator produces a very unlucky sequence. But since it is possible, I added plenty of documentation in the tests so a future failure could be diagnosed.
#Closed


// Initialize the ready set with those nodes that have no predecessors
var ready = ArrayBuilder<TNode>.GetInstance();
foreach ((TNode node, int count) in predecessorCounts)
Copy link
Member

@VSadov VSadov Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we dependent on the order of keys in a hashtable?

A completely stable sort WRT nodes/successors is probably an overkill, but I wonder if we could be getting different results when sorting something like symbols and whether that is ok or we never do that.
#Closed

Copy link
Member Author

@gafter gafter Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. The sort should be deterministic. I will build a deterministically-ordered list of nodes and use that here. #Closed

Copy link
Member

@sharwell sharwell Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out in #23391 (comment), the order of Dictionary<TKey, TValue> is deterministic as long as items are only added to it and never removed. It is unclear if that guarantee applies to dictionaries that are cleared (via Clear()). #Closed

Copy link
Member

@VSadov VSadov Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashes of the stuff we are going to put into the dictionary is not deterministic (bound nodes, symbols..)
In many cases the hash would be, or derived from, object hashes. That can easily change between builds. #Closed

Copy link
Member

@sharwell sharwell Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VSadov Dictionary<TKey, TValue> enumeration order equals the original insertion order until an item is removed and then another item is added, where the hashes have no impact on this rule. As I mentioned in the comment, this appears to be an undocumented behavior of the collection, though I do not believe it will change. #Closed

Copy link
Member

@VSadov VSadov Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We replace items in the dictionary here. I don't want to rely on Dictionary order in any case.


In reply to: 171730132 [](ancestors = 171730132)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is the Dictionary does not have any facility to remember the insertion order. It simply enumerates the internal bucket array where items are stored based on their truncated key hash values or whether there were collisions.
In our case it is fairly nondeterministic.

/// <param name="nodes">Any subset of the nodes that includes all nodes with no predecessors</param>
/// <param name="successors">A function mapping a node to its set of successors</param>
/// <returns>A list of all reachable nodes, in which each node always precedes its successors</returns>
public static ImmutableArray<TNode> IterativeSort<TNode>(IEnumerable<TNode> nodes, Func<TNode, IEnumerable<TNode>> successors)
Copy link
Member

@VSadov VSadov Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps want an overload with ImmutableArrays as inputs, but waiting until the actual use also seems ok. #ByDesign

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


var sorted = TopologicalSort.IterativeSort<int>(new[] { 4, 5 }, x => successors[x]);
AssertTopologicallySorted(sorted, successors, "Test01");
Assert.Equal(6, sorted.Length);
Copy link
Member

@jcouv jcouv Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do AssertEx.Equals(new[] {4, 5, 2, 3, 1, 0}, sorted); #Closed

};

var sorted = TopologicalSort.IterativeSort<int>(new[] { 4, 5 }, x => successors[x]);
AssertTopologicallySorted(sorted, successors, "Test01");
Copy link
Member

@jcouv jcouv Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Passing the name of the test is odd. Isn't the stacktrace on assertion sufficient? #Closed

Copy link
Member Author

@gafter gafter Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for this test it is. For others it isn't as there is a seed parameter. #Closed

Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

// At this point all the predecessor counts should be zero, otherwise there was a cycle
foreach ((TNode node, int count) in predecessorCounts)
Copy link
Member

@sharwell sharwell Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ Does (_, int count) work? We use var (_, count) in the IDE layer but I'm not sure how that translates to explicit type situations.

❓ Why not check that predecessorCounts.Count == resultBuilder.Count? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. I need to get out of the habit of enumerating dictionaries in any case, as that is nondeterministic.


In reply to: 171706953 [](ancestors = 171706953)

{
var node = ready.Pop();
resultBuilder.Add(node);
foreach (var succ in successors(node))
Copy link
Member

@sharwell sharwell Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I worry a bit about the number of IEnumerator<T> allocations required to process the list (potentially 2*n for a graph with n nodes) #Closed

Copy link
Member Author

@gafter gafter Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We visit each node and edge a constant number of times. We call successors(node) once to count predecessors, and then we call it once on each node to process the outgoing edges. Presuming each call allocates a new enumerator, that is 2n enumerators. If that is a problem for some caller the caller can cache. However, there is no getting around the fact that the caller must have this data allocated in memory in some form or another.


In reply to: 171726056 [](ancestors = 171726056)


int[] randomSubset(int count, int limit)
{
// We don't worry about duplicate values. That's all part of the test,
Copy link
Member

@sharwell sharwell Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I'm not sure this is correct. If you intend to support duplicate values in the result of successors, you should add a dedicated test to that effect.

Specifically, it seems that duplicate values in successors will bypass the de-duplication logic and result in a value claiming to have more predecessors than it actually does. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predecessor count is a count of incoming edges, not preceding nodes. I'll add a test.


In reply to: 171728301 [](ancestors = 171728301)

@gafter
Copy link
Member Author

gafter commented Mar 2, 2018

@sharwell I believe all of your comments have been addressed. Any other comments?

@gafter gafter merged commit 96fd284 into dotnet:features/recursive-patterns Mar 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants