-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement portable tailcall helpers #341
Implement portable tailcall helpers #341
Conversation
This implements tailcall-via-help support for all platforms supported by the runtime. In this new mechanism the JIT asks the runtime for help whenever it realizes it will need a helper to perform a tailcall, i.e. when it sees an explicit tail. prefixed call that it cannot make into a fast jump-based tailcall. The runtime created two important IL stubs to help the JIT in performing the necessary tailcalls. One IL stub is used to store the args for the tailcall, while the other is used to dispatch the actual tailcall itself. The JIT will then transform the call from return tail. F(a1, ..., an); to IL_STUB_StoreTailCallArgs(a1, ..., an); T result; IL_STUB_DispatchTailCalls(..., &result); return result; The dispatcher is written in such a way that it is able to dispatch multiple tailcalls in a row when tailcalled functions also perform tailcalls. To do this, the JIT helps the dispatcher detect if the caller's caller is also a dispatcher. When this is the case the dispatcher returns to let the previous dispatcher perform the tailcall with the currently stored args. This allows the frame to unwind and ensures that sequences of tailcalls do not grow the stack more than by a constant factor. Due to this unwinding the args cannot be stored on the stack and are instead stored in TLS. The GC is made specially of this buffer as the args can be anything, including interior pointers. The control-flow when performing the new tailcalls is nonstandard, so this also changes the debugger to support proper stepping into/over/out of tailcalled functions when they go through the new dispatcher. x86's tailcalling mechanism does not change.
Thank you for porting! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mostly nits and asks for more asserts. Overall to me the change looks good. Thank you for the work!
#endif | ||
|
||
JITDUMP("fgMorphTailCallViaHelper (before):\n"); | ||
JITDUMP("fgMorphTailCallViaHelpers (before):\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method in general makes interestingly scary changes to the argument list. I think in general this method should have many asserts. Any time the arg list changes there should be an assert that the rest of the arglist is unaffected. Also there should be entry and exit asserts checking that the arglist is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have suggestions on how to check those conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't think that far ahead sorry, I spoke with @BruceForstall and he and I didn't have a clear idea on how to do this today. Would you mind adding TODO: assert check this arg is the correct arg type to remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can address this post-merge.
Before this goes in we should do a round of asm diffs. |
Thanks for the review @jashook. I still have this as WIP as I believe there are some things left to do, eg. you have pointed some of them out in the reviews, and the runtime lookup done in morph needs to be investigated carefully. I don't really have the expertise to sign off on that change as it stands. |
Does the design document https://github.com/dotnet/runtime/blob/master/docs/design/features/tailcalls-with-helpers.md need to be updated to match this implementation? (Either with this change, or after it is merged?) |
@BruceForstall Yes, I have an updated document I will add soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only reviewed the JIT sources so far, and I mostly have questions or requests for additional comments.
Make STRESS_GENERIC_VARN more compatible with methods that make explicit tail calls. Don't add gc checks for explicit tail calls, and remove the code in morph that blocks tail calls if gc checks are active. Update the tailcall test to to disable STRESS_UNSAFE_BUFFER_CHECKS. This can be reverted when #341 is merged. Fixes #1752.
@jakobbotsch I'd like to help get this PR to a point we can merge it. You mentioned in a few comments above that you were going to do some experiments and make some changes. Do you have anything in progress? Can you add an updated document? Let me know what you think is the best way to get this ready for a merge. We don't want this work to bit-rot. I have cycles to help. |
@erozenfeld Sounds great. Unfortunately I have not had time to experiment further with this. I have just added the updated design doc so you can take a look at that for a description of how it works. Also, I am happy to answer any questions you might have.
Essentially the missing part is dealing with tailcalls to generic functions which require generic contexts (or maybe even other nonstandard arguments). My idea was to pass a callable function pointer to the IL stubs. Such a function pointer can be obtained by doing something like My idea was to repurpose There is some more discussion in the original thread, starting at dotnet/coreclr#26418 (comment). Once again, let me know if you have any questions about the design. |
@jakobbotsch Can you please rebase the changes? I plan to work on getting us closer to merge this week. |
@erozenfeld I addressed some feedback and resolved the conflicts. I merged instead of a rebase to make it a bit easier to check the merge resolutions, we can squash it later. |
Congratulations! Many, many thanks to @jakobbotsch for the implementation (and design updating) work, @janvorli for the original design, @jashook for mentoring, and @erozenfeld for doing the hard work to get it over the line. It's really great to see this get in! |
Also thanks to @CarolEidt , @AndyAyersMS and @jkotas for code reviews and feedback along the way! I opened two follow-up issues: #35551 (Jit: move runtime lookup expansion to lower) and #35423 (Add support for R2R tail calls with helpers in crossgen/crossgen2). |
Thanks to all for the help, and a special thanks to @erozenfeld for pushing this past the finishing line! I also opened #35559 about moving the dispatcher to System.Private.CoreLib once we have function pointer support. |
Awesome, I am very happy to see this going in! |
compFastTailCalls is true if COMPlus_FastTailCalls is not 0. It was introduced in dotnet#341 to help with testing: setting COMPlus_FastTailCalls to 0 forces all tail-prefixed calls to be dispatched via helpers. This fixes a subtle bug with checking compFastTailCalls: it needs to be checked after `fgInitargInfo` has been called in `fgCanFastTailCall`. `fgInitArgInfo` adds the stub address for VSD calls and `fgMorphTailCallViaHelpers` then removes it. This change also factors out the logic for checking whether the call has byref parameters that must be copied by the caller.
compFastTailCalls is true if COMPlus_FastTailCalls is not 0. It was introduced in #341 to help with testing: setting COMPlus_FastTailCalls to 0 forces all tail-prefixed calls to be dispatched via helpers. This fixes a subtle bug with checking compFastTailCalls: it needs to be checked after `fgInitargInfo` has been called in `fgCanFastTailCall`. `fgInitArgInfo` adds the stub address for VSD calls and `fgMorphTailCallViaHelpers` then removes it. This change also factors out the logic for checking whether the call has byref parameters that must be copied by the caller.
The old-style helper tail calls required the jit to copy implicit by-ref args. After dotnet#341 we are using old-style helper tail calls only for x86, which doesn't have implicit by-ref parameters. So the check is no longer necessary. This is a no-diffs cleanup change.
The old-style helper tail calls required the jit to copy implicit by-ref args. After #341 we are using old-style helper tail calls only for x86, which doesn't have implicit by-ref parameters. So the check is no longer necessary. This is a no-diffs cleanup change.
@jakobbotsch If you still have slides from your presentation about this work, can you send them to me? |
@erozenfeld Unfortunately I seem to have kept it on my Microsoft OneDrive which I do not have access to anymore. |
@erozenfeld there is the video from that presentation available if that's sufficient for you. |
This implements tailcall-via-help support for all platforms supported by
the runtime. In this new mechanism the JIT asks the runtime for help
whenever it realizes it will need a helper to perform a tailcall, i.e.
when it sees an explicit tail. prefixed call that it cannot make into a
fast jump-based tailcall.
The runtime creates two important IL stubs to help the JIT in performing
the necessary tailcalls. One IL stub is used to store the args for the
tailcall while the other is used to dispatch the actual tailcall
itself. The JIT will then transform the call from
return tail. F(a1, ..., an);
to
IL_STUB_StoreTailCallArgs(a1, ..., an);
T result;
IL_STUB_DispatchTailCalls(..., &result);
return result;
The dispatcher is written in such a way that it is able to dispatch
multiple tailcalls in a row when tailcalled functions also perform
tailcalls. To do this, the JIT helps the dispatcher detect if the
caller's caller is also a dispatcher. When this is the case the
dispatcher returns to let the previous dispatcher perform the tailcall
with the currently stored args. This allows the frame to unwind and
ensures that sequences of tailcalls do not grow the stack more than by a
constant factor.
Due to this unwinding the args cannot be stored on the stack and are
instead stored in TLS. The GC is made specially aware of this buffer as
the args can be anything, including interior pointers.
The control-flow when performing the new tailcalls is nonstandard, so
this also changes the debugger to support proper stepping into/over/out
of tailcalled functions when they go through the new dispatcher.
x86's tailcalling mechanism does not change.
Port of dotnet/coreclr#26418.
WIP:
ReturnAddress
intrinsics for detecting previous framestail. calli
tail. calli
Exceptionswe already have coverage in https://github.com/dotnet/coreclr/tree/master/tests/src/JIT/Methodical/Invoke/SEHldvirtftn
/ldftn
GT_RUNTIMELOOKUP
so it can be used from morphFYI @dotnet/jit-contrib @janvorli