Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer: enhance SROA, handle partially-initialized allocations #42834

Merged
merged 2 commits into from
Nov 7, 2021

Conversation

aviatesk
Copy link
Member

During adding more test cases for our SROA pass, I found our SROA doesn't
handle allocation sites with uninitialized fields at all.
This commit is built on top of #42833 and tries to handle such "unsafe" allocations,
if there are safe setfield! definitions.

For example, this commit allows the allocation r = Ref{Int}() to be
eliminated in the following example (adapted from https://hackmd.io/bZz8k6SHQQuNUW-Vs7rqfw?view):

julia> code_typed() do
           r = Ref{Int}()
           r[] = 42
           b = sin(r[])
           return b
       end |> only

This commit comes with a plenty of basic test cases for our SROA pass also.


Two test cases depend on #42831 as well.

Base automatically changed from avi/optopt to master October 28, 2021 16:31
@aviatesk aviatesk changed the base branch from master to avi/partialtype October 28, 2021 16:33
break
for use in du.uses
def = find_def_for_use(ir, domtree, allblocks, du, use)[1]
(def == 0 || def == idx) && @goto skip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If def == 0 here, doesn't that mean that this field is truly unitialized? maybe I am misunderstanding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def is the index where the object field is defined, so it could be:

  • object allocation site: def >= 1
  • setfield! call: def >= 1
  • PhiNode (i.e. the object field can be defined in different control flow, and different possibilities are merged at PhiNode): def == 0

So def == 0 case here corresponds to something like:

let src = code_typed((Bool,)) do cond
        r = Ref{Any}()
        if cond
            r[] = 42
        else
            r[] = 32
        end
        return r[] # ::PhiNode(32, 42)
    end |> only |> first
    @test_broken !any(src.code) do @nospecialize x
        Meta.isexpr(x, :new)
    end
end

Since traversing such PhiNode could be complicated (I think?), I just leave it as TODO.

Maybe this is worth to be left as comment ?

preserve_uses = IdDict{Int, Vector{Any}}((idx=>Any[] for idx in IdSet{Int}(defuse.ccall_preserve_uses)))
# Everything accounted for. Go field by field and perform idf
for (fidx, du) in pairs(fielddefuse)
for fidx in 1:ndefuse
du = fielddefuse[fidx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit nitpicky, but is there any difference between the original and this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no difference. If fielddefuse is type-unstable and it can contain arbitrary type objects (like user-given expressions fielddefuse = x.args where x::Expr), then iterators like pairs(fielddefuse) may produce tuple object whose type is very dynamic (which may cause lots of dispatch with almost no benefit), but for this case we don't need to care that since fielddefuse::Vector{SSADefUse}.

@ianatol
Copy link
Member

ianatol commented Oct 28, 2021

lgtm --- basically move doing idf back so as to include structs with potentially uninitialized fields, then handle those structs by making sure that any field we think could be uninitialized was defined somewhere

@aviatesk aviatesk force-pushed the avi/partialtype branch 5 times, most recently from 16b1154 to 3c6dfeb Compare November 1, 2021 06:25
Base automatically changed from avi/partialtype to master November 1, 2021 10:49
@oscardssmith
Copy link
Member

oscardssmith commented Nov 1, 2021

Will this break the ability to use Ref(x)[] with benchmarking to prevent @btime from turning expressions into constants?

@aviatesk
Copy link
Member Author

aviatesk commented Nov 1, 2021

Maybe ? If the definition is available in the analysis scope where Ref(x)[] is defined (the analysis scope after inlining)

@aviatesk
Copy link
Member Author

aviatesk commented Nov 1, 2021

lgtm --- basically move doing idf back so as to include structs with potentially uninitialized fields, then handle those structs by making sure that any field we think could be uninitialized was defined somewhere

Yea, the core change is here:

for use in du.uses
def = find_def_for_use(ir, domtree, allblocks, du, use)[1]
(def == 0 || def == idx) && @goto skip
end

Previously we give up anytime when allocation contains uninitialized field (fidx + 1 > length(defexpr.args)), but this PR tries to check if there is a "definitive" definition there (!(def == 0 || def == idx)) and then tries scalar replace.

@oscardssmith
Copy link
Member

Can you post results of @btime (1+1) versus @btime 1+Ref(1)[] with this PR? If this breaks the ability to hide info from profiling tools I think we need to be careful about merging it.

@aviatesk
Copy link
Member Author

aviatesk commented Nov 1, 2021

ah, Ref(1)[] should be replaced with 1 even without this PR.

@aviatesk
Copy link
Member Author

aviatesk commented Nov 1, 2021

On this PR:

julia> @btime (1+1)
  0.032 ns (0 allocations: 0 bytes)
2

julia> @btime 1+Ref(1)[]
  1.235 ns (0 allocations: 0 bytes)
2

The essential difference is that 1 + 1 is constant folded but 1 + Ref(1)[] isn't:

julia> code_typed() do
           a = 1 + Ref(1)[]
           b = 1 + 1
           a, b
       end
1-element Vector{Any}:
 CodeInfo(
1%1 = Base.add_int(1, 1)::Int64%2 = Core.tuple(%1, 2)::Tuple{Int64, Int64}
└──      return %2
) => Tuple{Int64, Int64}

@aviatesk
Copy link
Member Author

aviatesk commented Nov 1, 2021

@nanosoldier runbenchmarks("array" || "foldl" || "sort" || "union", vs = ":master")

@vtjnash
Copy link
Member

vtjnash commented Nov 1, 2021

It seems odd those aren't the same. We should fix that. The benchmarks are supposed to interpolate that Ref object to avoid the optimizer removing it.

@aviatesk
Copy link
Member Author

aviatesk commented Nov 1, 2021

Hm I don't see the oddness. Whether or not Ref(1)[] doesn't get optimized into 1, 1 + 1 will be fully constant-folded anyway and will run strictly faster than 1 + Ref(1)[].

@vchuravy
Copy link
Member

vchuravy commented Nov 1, 2021

Can you post results of @btime (1+1) versus @btime 1+Ref(1)[] with this PR? If this breaks the ability to hide info from profiling tools I think we need to be careful about merging it.

@oscardssmith the correct invocation is @btime 1+ $(Ref(1))[] (yes I know horrible.)

@vtjnash
Copy link
Member

vtjnash commented Nov 1, 2021

I don't think that is the key difference, since:

julia> code_llvm() do; 1+Ref(1)[]; end
;  @ REPL[16]:1 within `#14`
define i64 @"julia_#14_630"() #0 {
top:
  ret i64 2
}

But maybe something with the effect-free computation interacting with no-inline?

@vchuravy
Copy link
Member

vchuravy commented Nov 1, 2021

The essential difference is that 1 + 1 is constant folded but 1 + Ref(1)[] isn't:

This is what @Roger-luo observed in Yao. Constant propagation occurs during type-inference, but optimizations can form constant expressions, which will then be optimized by LLVM. Which is okay in the normal case, but cause us to write a const-propagation pass on SSAIR for YaoCompiler.

@aviatesk
Copy link
Member Author

aviatesk commented Nov 1, 2021

But maybe something with the effect-free computation interacting with no-inline?

I think the actual difference in performance stems from:

julia> f() = 1+1; g() = 1+Ref(1)[]; f(); g();

julia> Core.Compiler.invoke_api(only(methods(f)).specializations[1].cache) # with use constant calling convention
2

julia> Core.Compiler.invoke_api(only(methods(g)).specializations[1].cache) # just a usual function call
-1

So the question might be what Valentin pointed out: constant-folding at optimization time (rather, re-inference to maximize general optimization opportunities?). Or introduce mutation-analysis into inference and constant-fold more.

@vchuravy
Copy link
Member

vchuravy commented Nov 1, 2021

So the question might be what Valentin pointed out: constant-folding at optimization time (rather, re-inference to maximize general optimization opportunities?). Or introduce mutation-analysis into inference and constant-fold more.

I don't think re-running inference is possible (opens up a whole bag of convergence questions). Doing constant-folding at optimization time is likely beneficial and cheap, and can enable other optimizations.

Or introduce mutation-analysis into inference and constant-fold more.

That might be the way to go, but I am cautious over the cost-benefit here. I would rather enable this as SSAIR optimization so that we can do it after inlining as well.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk added merge me PR is reviewed. Merge when all tests are passing compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Nov 6, 2021
@aviatesk
Copy link
Member Author

aviatesk commented Nov 6, 2021

@nanosoldier runbenchmarks("array" || "foldl" || "sort" || "union", vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

During adding more test cases for our SROA pass, I found our SROA doesn't
handle allocation sites with uninitialized fields at all.
This commit is based on #42833 and tries to handle such "unsafe" allocations,
if there are safe `setfield!` definitions.

For example, this commit allows the allocation `r = Ref{Int}()` to be
eliminated in the following example (adapted from <https://hackmd.io/bZz8k6SHQQuNUW-Vs7rqfw?view>):
```julia
julia> code_typed() do
           r = Ref{Int}()
           r[] = 42
           b = sin(r[])
           return b
       end |> only
```

This commit comes with a plenty of basic test cases for our SROA pass also.
@aviatesk aviatesk merged commit 310de1c into master Nov 7, 2021
@aviatesk aviatesk deleted the avi/moresroa branch November 7, 2021 13:39
aviatesk added a commit to aviatesk/EscapeAnalysis.jl that referenced this pull request Nov 7, 2021
@DilumAluthge DilumAluthge removed the merge me PR is reviewed. Merge when all tests are passing label Nov 7, 2021
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
…liaLang#42834)

During adding more test cases for our SROA pass, I found our SROA doesn't
handle allocation sites with uninitialized fields at all.
This commit is based on JuliaLang#42833 and tries to handle such "unsafe" allocations,
if there are safe `setfield!` definitions.

For example, this commit allows the allocation `r = Ref{Int}()` to be
eliminated in the following example (adapted from <https://hackmd.io/bZz8k6SHQQuNUW-Vs7rqfw?view>):
```julia
julia> code_typed() do
           r = Ref{Int}()
           r[] = 42
           b = sin(r[])
           return b
       end |> only
```

This commit comes with a plenty of basic test cases for our SROA pass also.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
…liaLang#42834)

During adding more test cases for our SROA pass, I found our SROA doesn't
handle allocation sites with uninitialized fields at all.
This commit is based on JuliaLang#42833 and tries to handle such "unsafe" allocations,
if there are safe `setfield!` definitions.

For example, this commit allows the allocation `r = Ref{Int}()` to be
eliminated in the following example (adapted from <https://hackmd.io/bZz8k6SHQQuNUW-Vs7rqfw?view>):
```julia
julia> code_typed() do
           r = Ref{Int}()
           r[] = 42
           b = sin(r[])
           return b
       end |> only
```

This commit comes with a plenty of basic test cases for our SROA pass also.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants