Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating domtrees dynamically, removing all unreachable blocks #33730

Closed
wants to merge 16 commits into from

Conversation

yhls
Copy link
Contributor

@yhls yhls commented Oct 31, 2019

This all started with #29107, which (is at least one thing that) stands in the way of turning DCE back on globally (it was disabled in #29265). The problem is that when there is only one predecessor, replacing a phi node with its only possible value can result in a forward reference when this phi node is in an unreachable loop. This can cause problems with passes that process all IR, whether or not the code is dead.

We can remove dead loops by removing all unreachable blocks, instead of just removing blocks with no predecessors, which is what we currently do. To determine reachability, all that would be required is a depth-first traversal of the CFG, but as suggested by #29140, I have implemented an algorithm for updating the dominator tree of a CFG dynamically. I hear that this would be useful to have anyway. Just to be safe, I’ve also prevented a phi node from being replaced with its value if that value is defined after the phi node is.

The dynamic domtree algorithm basically works by figuring out if we can get away with doing nothing or recomputing only some of the immediate dominators when we add or delete an edge in the CFG, or if we have to recompute the whole domtree from scratch.

As this PR stands, domtrees are not updated dynamically yet, even though the dynamic algorithm has been implemented. First, I tried just computing the domtree from scratch when we decide which blocks are dead, which happens when an IncrementalCompact is created. I tried to get an idea of the performance impact by comparing how long Base and Stdlibs took to compile, but the compile times were pretty noisy and I couldn’t confidently notice any difference. Next, I tried adding a DomTree to every CFG and computing it every time a CFG is constructed. This appears to lengthen compile times by about 5% compared with master, and 3% compared with master with DCE on. The only times a new CFG is made from an old one are when we finish iterating over an IncrementalCompact, and in domsort_ssa!. Presumably we would be able to recover some performance by updating the domtree dynamically in these places.

I’ve also addressed a bug wherein references to dead blocks that have been removed remain in phi nodes.

@yhls yhls force-pushed the yhls/dynamicdomtree branch from 7a9608c to 1fd4d72 Compare October 31, 2019 16:51
@vchuravy vchuravy requested a review from Keno October 31, 2019 16:59
@vchuravy vchuravy added the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Oct 31, 2019
@yhls yhls force-pushed the yhls/dynamicdomtree branch from 1fd4d72 to c9f383d Compare October 31, 2019 17:27
@yhls
Copy link
Contributor Author

yhls commented Oct 31, 2019

Hmm, it seems to take forever on test/compiler/irpasses.jl, will investigate.

@yhls yhls force-pushed the yhls/dynamicdomtree branch from c9f383d to cd53d28 Compare November 1, 2019 17:24
@yhls
Copy link
Contributor Author

yhls commented Nov 1, 2019

Fixed. Turns out that was because I was verifying the domtree against output from the naive immediate dominator algorithm, which takes way too long on a big test case in test/compiler/irpasses.jl.

base/compiler/ssair/ir.jl Outdated Show resolved Hide resolved
@Keno
Copy link
Member

Keno commented Nov 3, 2019

Amazing work (seriously, I'm quite impressed). However, before getting into the implementation, I think we have a more high-level problem: I don't think eliminating dead blocks in the IncrementalCompact constructor is enough. The issue that caused the original problem is that removing edges can turn instructions in unreachable blocks invalid and if the pass then goes to process them we can get into a bad state. Of course the dynamic domtree path is designed to address that, so we're almost there with this PR, but I don't think we can turn it on by default until we actually update it in IncrementalCompact whenever we remove an edge and kill all invalid blocks right then (or at least remove all contained instructions). Am I missing something?

@yhls
Copy link
Contributor Author

yhls commented Nov 5, 2019

Removing dead blocks in the constructor is actually enough to fix #29107, because the particular thing that was going wrong (phi nodes being replaced if they only had one possible value, which can be bad in dead loops) was only ever done while iterating through an IncrementalCompact, while it looks like edges are currently only ever removed outside of them, during inlining. But I agree that there could potentially be a case where something goes wrong if we remove an edge while iterating through an IncrementalCompact (although the kill_edge! for IncrementalCompact objects is currently unused).

According to #29107, invalid IR in dead code is explicitly allowed (I think this is also true for LLVM). Do we want to keep this convention? If so, once we really do update the domtree dynamically, passes will be able to check if a block is dead or not instantaneously. Or do we actually want never to have any dead code at all?

@yhls
Copy link
Contributor Author

yhls commented Nov 5, 2019

Wait, the kill_edge! for IncrementalCompact objects is used...

@yhls yhls force-pushed the yhls/dynamicdomtree branch 2 times, most recently from 7e4f0e0 to 3920b30 Compare November 25, 2019 20:54
@yhls yhls force-pushed the yhls/dynamicdomtree branch from 5479a4f to b3f91d6 Compare December 15, 2019 03:53
@vchuravy vchuravy requested a review from Keno December 16, 2019 21:20
@yhls
Copy link
Contributor Author

yhls commented Dec 17, 2019

Ok, domtrees are updated dynamically now, and I fixed the bugs that cropped up when I turned DCE back on. Here's a summary of changes in this PR:

  • DCE is turned back on (reverting Disable CFG transforms for now #29265), and all unreachable blocks are removed (not just ones with no predecessors), as determined when IncrementalCompacts are constructed
  • if kill_edge! makes blocks unreachable, all their statements are now killed immediately (set to nothing or ReturnNode())--this should prevent passes from seeing invalid statements in dead blocks
  • phi node values no longer get forwarded to their uses in the case that there's only one incoming edge, if this value is coming from after the phi node (this fixes Compiler bug uncovered by BaseBenchmarks #29107, although is just to be safe because removing all unreachable blocks also fixes it)
  • fixed Regression with newest Julia? #29253, an infinite loop in kill_edge!, block merging cycles in cfg_simplify!, missing used_ssas counts, and removed blocks being referenced in phi nodes

Some changes to the code:

  • every CFG now contains a DomTree, which must be kept up to date
  • DomTrees are no longer explicitly passed to functions like domsort_ssa!
  • construct_domtree now takes Vector{BasicBlock} instead of CFG
  • DFSTrees now keep track of postorder numbers as well as preorder numbers, and are stored in every DomTree

Anything that modifies the CFG will have to keep the domtree up to date as well. kill_edge!(IncrementalCompact, ...) does this already, but otherwise, cfg_insert_edge!, cfg_delete_edge!, domtree_insert_edge!, domtree_delete_edge!, rename_blocks!, and combine_blocks! can be used. If many edge insertions or deletions are being done, it may be faster (and simpler) just to reconstruct the domtree from scratch, because the algorithm resorts to reconstructing the whole domtree if the change affects the preorder number of the blocks. (This is why the domtree is recomputed from scratch in domsort_ssa! and batch_inline.)

@yhls
Copy link
Contributor Author

yhls commented Dec 17, 2019

Here's a rough measurement of impact on performance: in one run, master took a total of 67.2 seconds to compile the sysimage, and with this PR, 70.1 seconds (104%). Note that master with the modification that allow_cfg_transforms=true took 68.7 seconds (102%).

@vchuravy vchuravy marked this pull request as ready for review December 17, 2019 19:27
@vchuravy vchuravy changed the title WIP/RFC: updating domtrees dynamically, removing all unreachable blocks updating domtrees dynamically, removing all unreachable blocks Dec 17, 2019
@StefanKarpinski
Copy link
Member

Perhaps it’s obvious or stated somewhere I’m missing, but what is the benefit of this change?

@ViralBShah
Copy link
Member

As @vchuravy explained on slack: Cleaner IR, better readability, and better for tools like XLA.jl that operate on typed IR

@yhls yhls force-pushed the yhls/dynamicdomtree branch from b3f91d6 to a8913b5 Compare December 17, 2019 21:11
@yhls
Copy link
Contributor Author

yhls commented Dec 17, 2019

Just edited some commits to avoid changing some structs into mutable structs.

@yhls yhls force-pushed the yhls/dynamicdomtree branch from a8913b5 to 5a1114c Compare December 17, 2019 21:20
Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side, although I would like @Keno to sign off as well, since he has a deeper understanding
of that part of the code.

length(D::DFSTree) = length(D.from_pre)

function DFS!(D::DFSTree, blocks::Vector{BasicBlock})
copy!(D, DFSTree(length(blocks)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this empty!/resize!?

Copy link
Contributor Author

@yhls yhls Jan 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a resize! if it weren't for the fact that the to_pre, to_post, and to_parent_pre fields need to be reset to all zeros.

@yhls yhls force-pushed the yhls/dynamicdomtree branch 2 times, most recently from b677df8 to b63d3f7 Compare March 11, 2020 18:33
@yhls yhls force-pushed the yhls/dynamicdomtree branch from b63d3f7 to a978294 Compare April 1, 2020 17:42
@vchuravy vchuravy mentioned this pull request Jul 28, 2020
3 tasks
yhls added 2 commits August 2, 2020 22:01
Phi nodes are optimized away when there is only one predecessor, but this can
cause problems in dead loops because forward references can be created, leading
to issues with optimization passes that look at all code, dead or not. This
fixes issue JuliaLang#29107 when DCE is turned on.
@yhls yhls force-pushed the yhls/dynamicdomtree branch from a978294 to f638a1a Compare August 3, 2020 06:02
@yhls
Copy link
Contributor Author

yhls commented Aug 3, 2020

Rebased onto master, got things working with DCE turned on again with a small bugfix for #36684

yhls added 14 commits August 3, 2020 00:51
…ot counted

This fixes JuliaLang#29253, which was caused by `simple_dce!` erroneously erasing SSA
values that did not appear to be used, because these uses were only discovered
in `just_fixup!` at the end of iterating over an `IncrementalCompact`.
PR JuliaLang#36684 changes `iterate(IncrementalCompact)` to return an extra index, but
leaves its arguments unchanged. However, the PR decremented the index argument
in a particular recursive call to `iterate`. This caused `iterate` not to
recognise that it was done when `allow_cfg_transforms` was turned on.
…ons and deletions

The DFS tree associated with a CFG now keeps track of postorder as well as
preorder numbers. The DFS tree, as well as the state associated with the SNCA
algorithm for finding (immediate) dominators is now stored in DomTree and
reused for Dynamic SNCA.
… predecessors

For now, just construct the domtree when we make an `IncrementalCompact` rather
than try to update it (the domtree) incrementally.
…of CFG

This is in anticipation of domtrees being added to CFGs.
…n needed

Every time a CFG is created, its corresponding dominator tree is as well.
… edges

This change only affects statements that we have yet to encounter after killing
an edge, while iterating through `IncrementalCompact`. Statements in dead
blocks that come before the point at which the edge is killed are killed in
`kill_edge!`, when the edge is killed.
If a statement was `nothing`, `kill_edge!` would never move on from trying to
kill it because the index wasn't incremented.
This is so we can add type declarations to fields in ir.jl that are domtrees,
by breaking the dependency loop between domtree.jl (uses basic blocks but
defines domtrees) and ir.jl (uses domtrees but defined basic blocks).
@yhls yhls force-pushed the yhls/dynamicdomtree branch from f638a1a to 6a250c4 Compare August 3, 2020 08:22
@yhls
Copy link
Contributor Author

yhls commented Aug 3, 2020

I've separated the freestanding bugfixes into #36888

@vchuravy
Copy link
Member

@Keno is this ok to be merged with 6a250c4 removed?

@vchuravy
Copy link
Member

vchuravy commented Oct 5, 2020

Replaced by #37882

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants