-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LLVM level allocation optimization pass #22684
Conversation
57a519f
to
cbfa09e
Compare
@nanosoldier |
cbfa09e
to
c9c47ce
Compare
@nanosoldier |
308a53f
to
7a852a4
Compare
@nanosoldier |
Stealing my thunder, eh? That's ok :). I'll try to review this tomorrow. |
src/llvm-alloc-opt.cpp
Outdated
if (auto call = dyn_cast<CallInst>(I)) { | ||
if (ptr_from_objref && ptr_from_objref == call->getCalledFunction()) | ||
return true; | ||
// Only use in argument counts, uses in operand bundle doesn't since it cannot escape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah then "uses ... don't" or "use ... doesn't"
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
src/llvm-alloc-opt.cpp
Outdated
bool ignore_tag = true; | ||
auto orig = it.first; | ||
if (optimize && checkUses(orig, 0, ignore_tag)) { | ||
// The allocation does not escape or be used in a phi node so none of the derived |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or get used ?
Does this mean we can finally deprecate |
FWIW, the main part of this is just ~4hrs of work after I suddenly got interested in checking if the gcframe placement pass is late enough for us to do this. So nothing much would be lost if you already have a better version working ;-p (Plus I've been implementing Jameson's ideas all along )
Yes, that should be the case this PR is the best at doing. There are still rare cases where the |
@@ -2156,25 +2156,10 @@ static Value *emit_allocobj(jl_codectx_t &ctx, size_t static_size, Value *jt) | |||
{ | |||
JL_FEAT_REQUIRE(ctx, dynamic_alloc); | |||
JL_FEAT_REQUIRE(ctx, runtime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably postpone these checks to the actual lowering (but we don't have a cgctx
there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be passed as a paramter. What would be the correct behavior if the test fail though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abort compilation. currently JL_FEAT_REQUIRE
just calls jl_error
, and we use the cgctx
for current function name & line number, so we'd need to do something else anyway (use DebugLoc, I assume). better leave that to a different PR, for now maybe just:
if (!JL_FEAT_TEST(ctx, dynamic_alloc)) jl_error(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I don't think we can throw an error in the llvm pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I hadn't considered that, as it worked with _dump_function
but seems to mess up the JIT indeed. Then again, I only use that feature through _dump_function
...
Maybe a "used features mask" to be checked after lowering? Let's just leave it at what it is now and put that in a different PR.
gc_alloc_args.push_back(T_prjlvalue); | ||
jl_alloc_obj_func = Function::Create(FunctionType::get(T_prjlvalue, gc_alloc_args, false), | ||
Function::ExternalLinkage, | ||
"julia.gc_alloc_obj"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this function should get a noalias
attribute.
Couple of suggestions:
|
16eeea4
to
aae4455
Compare
|
4e3af21
to
70f88e0
Compare
@nanosoldier |
70f88e0
to
340ae86
Compare
Why is the lowering not relatively trivial? |
Or if it is, GC root lowering looks at all the calls and rewrites most of them anyway (because of the cc convention), so it seems like a fine place to do it. |
The concern is that this pass might not know whether it's the last one (in IPO pipelines the pass manager will automatically rerun parts of the pipeline). |
7ad6a9c
to
f71401a
Compare
Moved the intrinsic lowering to gcframe lowering pass and updated the test. |
Travis failure looks unrelated and is happenning everywhere.... |
More comments? |
I'll try to take another pass through this tonight or tomorrow morning. |
} | ||
} | ||
|
||
bool AllocOpt::isSafepoint(Instruction *inst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull this out into a helper for both this and the GC lowering code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the lowering pass actually need this function? It has similar logic but does different things for different branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need this function, but I'd like to at least share the "Known functions emitted in codegen that are not safepoints" part, so we don't have to make changes there in multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that when the lowering pass recognize more functions as non safepoint, we don't necessarily want to update this pass to include those. Here I included the same list just to be safe. In principle, this needs a list of functions that codegen assums to be not safepoint. There can be functions that are never safepoint but as long as neither codegen or llvm can insert a call into a unsafe use chain it doesn't need to be treated as non safepoint here.
src/llvm-alloc-opt.cpp
Outdated
{ | ||
if (!alloc_obj) | ||
return false; | ||
std::map<CallInst*,size_t> allocs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::map is generally discouraged in LLVM passes because of nondeterministic iteration order. Can you use a data structure with deterministic iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that I actually don't need to lookup anything in it so I'll just use a vector instead...
src/llvm-alloc-opt.cpp
Outdated
if (ptr_from_objref && ptr_from_objref == call->getCalledFunction()) | ||
return true; | ||
auto opno = use->getOperandNo(); | ||
// Uses in `jl_roots` operand bundle are not counted as escaping, everything else do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: everything else is
.
}; | ||
// Both `orig_i` and `new_i` should be pointer of the same type | ||
// but possibly different address spaces. `new_i` is always in addrspace 0. | ||
auto replace_inst = [&] (Instruction *user) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can technically mutate instructions in place, but I'm fine with this implementation as well.
f71401a
to
cd8f33e
Compare
Updated comment and switched to a vector instead for recording the allocations. |
cd8f33e
to
05cefe8
Compare
This can obtain escape information with much higher precision than what we can currently do in typeinf. However, it does not replace the alloc_elim_pass! in type inference either since this cannot handle objects with reference fields. Fix #20452
05cefe8
to
b1a188c
Compare
I'd like to not make any other changes about the organization, especially since I still don't think it's a good idea to move the allocation that late in the pipeline. It disables the llvm constant folding of write barrier on 5.0.... There are two FreeBSD timeout and I have no way to debug it so I'll just wait for someone else to figure out what's wrong there or fix it or merge this PR. |
And for anyone who want to use this on master, the hanging test on FreeBSD is |
JULIA_TEST_MAX_RSS is set so it is moved to worker one and will only run if everything else finishes. |
👌 |
Restarted one passed so I'm merging this now.... |
got some compliation warnning from master e1a604e /usr/home/iblis/git/julia/src/llvm-alloc-opt.cpp:639:53: warning: braces around scalar initializer [-Wbraced-scalar-init]
{ConstantInt::get(T_size, -1)});
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/home/iblis/git/julia/src/llvm-late-gc-lowering.cpp:1173:50: warning: braces around scalar initializer [-Wbraced-scalar-init]
{ConstantInt::get(T_size, -1)});
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
1 warning generated. with
|
This is awesome! From the nanosoldier run, it looks like we don't have any benchmarks that meaningfully improve with this? We should try to add some. |
Regression on this will probably show up in future benchmarks as we have more and more use of We are in general very good at avoiding performance pitfails in the benchmarks.... |
Test might be added later.
Just to give a taste of it though
This takes advantage of the LLVM optimizations to get more precise esape info but it doesn't replace allocation elimination in typeinf, which can also split allocations and do more fancy transformations.
The code generated is not 100% optimum (issue from LLVM optimization order) though that might be better when we can run this later in the pipeline. The current placement for 5.0 is arbitrary....
@nanosoldier
runbenchmarks(ALL, vs = ":master")