-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updating domtrees dynamically, removing all unreachable blocks #33730
Conversation
7a9608c
to
1fd4d72
Compare
1fd4d72
to
c9f383d
Compare
Hmm, it seems to take forever on test/compiler/irpasses.jl, will investigate. |
c9f383d
to
cd53d28
Compare
Fixed. Turns out that was because I was verifying the domtree against output from the naive immediate dominator algorithm, which takes way too long on a big test case in test/compiler/irpasses.jl. |
Amazing work (seriously, I'm quite impressed). However, before getting into the implementation, I think we have a more high-level problem: I don't think eliminating dead blocks in the |
Removing dead blocks in the constructor is actually enough to fix #29107, because the particular thing that was going wrong (phi nodes being replaced if they only had one possible value, which can be bad in dead loops) was only ever done while iterating through an According to #29107, invalid IR in dead code is explicitly allowed (I think this is also true for LLVM). Do we want to keep this convention? If so, once we really do update the domtree dynamically, passes will be able to check if a block is dead or not instantaneously. Or do we actually want never to have any dead code at all? |
Wait, the |
7e4f0e0
to
3920b30
Compare
5479a4f
to
b3f91d6
Compare
Ok, domtrees are updated dynamically now, and I fixed the bugs that cropped up when I turned DCE back on. Here's a summary of changes in this PR:
Some changes to the code:
Anything that modifies the CFG will have to keep the domtree up to date as well. |
Here's a rough measurement of impact on performance: in one run, master took a total of 67.2 seconds to compile the sysimage, and with this PR, 70.1 seconds (104%). Note that master with the modification that |
Perhaps it’s obvious or stated somewhere I’m missing, but what is the benefit of this change? |
As @vchuravy explained on slack: Cleaner IR, better readability, and better for tools like XLA.jl that operate on typed IR |
b3f91d6
to
a8913b5
Compare
Just edited some commits to avoid changing some structs into mutable structs. |
a8913b5
to
5a1114c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my side, although I would like @Keno to sign off as well, since he has a deeper understanding
of that part of the code.
length(D::DFSTree) = length(D.from_pre) | ||
|
||
function DFS!(D::DFSTree, blocks::Vector{BasicBlock}) | ||
copy!(D, DFSTree(length(blocks))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this empty!
/resize!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a resize!
if it weren't for the fact that the to_pre
, to_post
, and to_parent_pre
fields need to be reset to all zeros.
b677df8
to
b63d3f7
Compare
b63d3f7
to
a978294
Compare
Phi nodes are optimized away when there is only one predecessor, but this can cause problems in dead loops because forward references can be created, leading to issues with optimization passes that look at all code, dead or not. This fixes issue JuliaLang#29107 when DCE is turned on.
a978294
to
f638a1a
Compare
Rebased onto master, got things working with DCE turned on again with a small bugfix for #36684 |
…ot counted This fixes JuliaLang#29253, which was caused by `simple_dce!` erroneously erasing SSA values that did not appear to be used, because these uses were only discovered in `just_fixup!` at the end of iterating over an `IncrementalCompact`.
PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but leaves its arguments unchanged. However, the PR decremented the index argument in a particular recursive call to `iterate`. This caused `iterate` not to recognise that it was done when `allow_cfg_transforms` was turned on.
…ons and deletions The DFS tree associated with a CFG now keeps track of postorder as well as preorder numbers. The DFS tree, as well as the state associated with the SNCA algorithm for finding (immediate) dominators is now stored in DomTree and reused for Dynamic SNCA.
… predecessors For now, just construct the domtree when we make an `IncrementalCompact` rather than try to update it (the domtree) incrementally.
…of CFG This is in anticipation of domtrees being added to CFGs.
…n needed Every time a CFG is created, its corresponding dominator tree is as well.
…namic domtree implementation
… edges This change only affects statements that we have yet to encounter after killing an edge, while iterating through `IncrementalCompact`. Statements in dead blocks that come before the point at which the edge is killed are killed in `kill_edge!`, when the edge is killed.
If a statement was `nothing`, `kill_edge!` would never move on from trying to kill it because the index wasn't incremented.
This is so we can add type declarations to fields in ir.jl that are domtrees, by breaking the dependency loop between domtree.jl (uses basic blocks but defines domtrees) and ir.jl (uses domtrees but defined basic blocks).
f638a1a
to
6a250c4
Compare
I've separated the freestanding bugfixes into #36888 |
Replaced by #37882 |
This all started with #29107, which (is at least one thing that) stands in the way of turning DCE back on globally (it was disabled in #29265). The problem is that when there is only one predecessor, replacing a phi node with its only possible value can result in a forward reference when this phi node is in an unreachable loop. This can cause problems with passes that process all IR, whether or not the code is dead.
We can remove dead loops by removing all unreachable blocks, instead of just removing blocks with no predecessors, which is what we currently do. To determine reachability, all that would be required is a depth-first traversal of the CFG, but as suggested by #29140, I have implemented an algorithm for updating the dominator tree of a CFG dynamically. I hear that this would be useful to have anyway. Just to be safe, I’ve also prevented a phi node from being replaced with its value if that value is defined after the phi node is.
The dynamic domtree algorithm basically works by figuring out if we can get away with doing nothing or recomputing only some of the immediate dominators when we add or delete an edge in the CFG, or if we have to recompute the whole domtree from scratch.
As this PR stands, domtrees are not updated dynamically yet, even though the dynamic algorithm has been implemented. First, I tried just computing the domtree from scratch when we decide which blocks are dead, which happens when an
IncrementalCompact
is created. I tried to get an idea of the performance impact by comparing how long Base and Stdlibs took to compile, but the compile times were pretty noisy and I couldn’t confidently notice any difference. Next, I tried adding aDomTree
to everyCFG
and computing it every time aCFG
is constructed. This appears to lengthen compile times by about 5% compared with master, and 3% compared with master with DCE on. The only times a newCFG
is made from an old one are when we finish iterating over anIncrementalCompact
, and indomsort_ssa!
. Presumably we would be able to recover some performance by updating the domtree dynamically in these places.I’ve also addressed a bug wherein references to dead blocks that have been removed remain in phi nodes.