Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant propagation without iterations and stack based #1756

Merged
merged 14 commits into from
Jan 20, 2021

Conversation

vitek-karas
Copy link
Member

@vitek-karas vitek-karas commented Jan 14, 2021

Changes how the constant propagation works. Instead of iterating over all methods multiple times, use a stack to emulate recursive processing of methods as they are needed.

The doc in this change contains lot of details about the motivation and the algorithm used so please refer to that.

This essentially makes constant propagation and branch removal per-method and not global step. It's still left as a global step which walks all methods and processes them, but that part is now not required by the algorithm.

Interesting statistics (on a console hello world app):

  • Before this change we scanned over all methods 3 times - in total more than 217K method body scans). After the change we only scan over a total of 73K method bodies (total number of methods is 72K, so we scan most methods only once)
  • Since there are no loops in framework, the maximum number of times we scan any method is 2
  • The maximum depth of the processing stack is 16 (but it's a simple app)

The great savings in CPU are accompanied with a larger memory usage. The step now stores a dictionary of all methods (and one reference as a value, the values are almost all non-unique so the storage necessary for values is tiny). This will be necessary anyway for once we move this into MarkStep as it has to keep a cache of already processed methods.

Part of #1733

@vitek-karas vitek-karas marked this pull request as ready for review January 15, 2021 15:51
@vitek-karas vitek-karas changed the title [WIP] Constant propagation without iterations and stack based Constant propagation without iterations and stack based Jan 15, 2021
@vitek-karas
Copy link
Member Author

Should be ready for final review - PTAL.

src/linker/Linker.Steps/RemoveUnreachableBlocksStep.cs Outdated Show resolved Hide resolved
src/linker/Linker.Steps/RemoveUnreachableBlocksStep.cs Outdated Show resolved Hide resolved
src/linker/Linker.Steps/RemoveUnreachableBlocksStep.cs Outdated Show resolved Hide resolved
src/linker/Linker.Steps/RemoveUnreachableBlocksStep.cs Outdated Show resolved Hide resolved

// Note that stack version is not changing - we're just postponing work, not resolving anything.

constantResultInstruction = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the logic here? Could you add comment how does the code deal with possibly two different results for same method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question. Method can only have one final result. If it's still in the queue (as is this case) it doesn't have a result yet. That's why it returns false here - telling the caller that no result is available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, why do you need the remove/addfirst then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To move it to the top of the stack. This is so that callees are processed before callers next time around. For example if both A->C,B and B->C... after processing A the stack would be B,C,A (first is top). After processing B without the shuffle it would remain B,C,A - and it would get stuck. So the remove/add moves C to the begining so that the stack is C,B,A - C is processed and then B can get processed and so on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it looks like this is enough if we don't handle recursion. I couldn't come with a case where this would break ;-)

// - ProcessedUnchangedSentinel - which means the method has been fully processed and nothing was changed on it - its value is unknown
// - NonConstSentinel - which means the method has been processed and the return value is not a const
// - Instruction instance - method has been processed and it has a constant return value (the value of the instruction)
Dictionary<MethodDefinition, object> processedMethods;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the perf gain is worth the complexity. Did you consider storing the instructions result in its own cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only value which we could in theory store in a different place is the reference to the queue. All the others are valid "final" results. The instruction means it's a const method and has a value (just like before). The NonConstSentinel means it doesn't have a const value - this is new but necessary. Before this change we relied on multiple iterations to react to changes. The cache only stored const values - if a method was not in the cache it could mean either "nonconst" or "don't know/yet". In this change we need to do things per-method, and so we need to store the negative "nonconst" results in the cache as well.
The only optimization here is the ProcessedUnchangedSentitel which means "processed, but didn't compute result". This is pretty important optimization. In the hello world app, there's a total off 75K methods, 67K end up in this ProcessedUnchanged state - meaning we did process them (good), but nothing needed to their value. So we avoided running the const analyzer on ~80% of the methods. I could remove this, but it feels like a good optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that the code uses the cache for two different purpose

  1. as the final method -> Instructions? cache
  2. as temporary processing cache

and I'm just not sure it's worth dropping to object to facilitate that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - split it into two dictionaries.

src/linker/Linker.Steps/RemoveUnreachableBlocksStep.cs Outdated Show resolved Hide resolved
// No such node was found -> we only have nodes in the loop now, so we have to break the loop.
// We do this by processing it with special flag which will make it ignore any unprocessed dependencies
// treating them as non-const. These should only be nodes in the loop.
treatUnprocessedAsNonConst = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be stack-based for nested loops?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only the inner most loop will be "caught" by the detection here - all nodes having the same version. This will "break" one of the nodes in the inner most loop. Continuing processing should then completely process the inner most loop, at which point the outer loop will be detected (since it will become the inner most at that point) and so on...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new tested with nested loops to validate that it doesn't break the algorithm.

}

// No such node was found -> we only have nodes in the loop now, so we have to break the loop.
// We do this by processing it with special flag which will make it ignore any unprocessed dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests a better name for the field, which part of the name the bool value treatUnprocessedAsNonConst does change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I don't think I follow...
I guess the name could mention that it's for breaking loops - but it didn't felt right to me. We are using it to break loops, sure, but what it does is in its name - it will treat all unprocessed dependencies as non-const.

I'll change it to treatUnprocessedDependenciesAsNonConst.

Copy link
Contributor

@marek-safar marek-safar Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me ask differently, what does treatUnprocessedDependenciesAsNonConst = false do ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's at least one unprocessed dependency, the processing of the current method will be stopped and we go back to the stack - in the code/comments I call this "backing off". The method will be retried at some point.

Don't store methods on stack in the same structure as methods which are already processed. Makes the processed methods structure more strongly typed.
Copy link
Member

@sbomer sbomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to note - I think this has worst-case performance of O(n^2) in the number of loop methods since we can iterate over the loop methods again whenever we encounter a new call to something already on the stack. But it shouldn't matter as long as the loop cases are rare and small.

I left a couple other comments about optimizing the loop cases, but feel free to disregard them (I think readability matters more as long as we are already assuming loops are uncommon).

// To fix this go over the stack and find the "oldest" node with the current version - the "oldest" node which
// is part of the loop:
var lastNodeWithCurrentVersion = stackNode;
for (var currentNode = stackNode; currentNode != null; currentNode = currentNode.Next) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will walk the whole stack . I'm not sure it matters since we expect loop cases are rare, but would lead to bad worst-case performance. A small optimization would be to walk the list backwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea - changed.

// Now go back over all nodes from the "oldest" one back to the top and find any nodes which are not of current version.
// For all of them, move them to the top of the stack.
var candidateNodeToMoveToTop = lastNodeWithCurrentVersion;
bool foundNodesWithNonCurrentVersion = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also compute foundNodesWithNonCurrentVersion in the loop above to avoid walking the loop nodes twice in case there are none.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it can all be done in one backward full walk of the stack - changed the implementation.

src/linker/Linker.Steps/RemoveUnreachableBlocksStep.cs Outdated Show resolved Hide resolved
@vitek-karas vitek-karas merged commit 279c07f into dotnet:master Jan 20, 2021
@vitek-karas vitek-karas deleted the ConstantPropStackProcessing branch January 20, 2021 12:39
agocke pushed a commit to dotnet/runtime that referenced this pull request Nov 16, 2022
…r#1756)

Changes how the constant propagation works. Instead of iterating over all methods multiple times, use a stack to emulate recursive processing of methods as they are needed.

The doc in this change contains lot of details about the motivation and the algorithm used so please refer to that.

This essentially makes constant propagation and branch removal per-method and not global step. It's still left as a global step which walks all methods and processes them, but that part is now not required by the algorithm.

Interesting statistics (on a console hello world app):
* Before this change we scanned over all methods 3 times - in total more than 217K method body scans). After the change we only scan over a total of 73K method bodies (total number of methods is 72K, so we scan most methods only once)
* Since there are no loops in framework, the maximum number of times we scan any method is 2
* The maximum depth of the processing stack is 16 (but it's a simple app)

The great savings in CPU are accompanied with a larger memory usage. The step now stores a dictionary of all methods (and one reference as a value, the values are almost all non-unique so the storage necessary for values is tiny). This will be necessary anyway for once we move this into `MarkStep` as it has to keep a cache of already processed methods.

Commit migrated from dotnet/linker@279c07f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants