Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement portable tailcall helpers #341

Merged
merged 42 commits into from
Apr 28, 2020

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Nov 27, 2019

This implements tailcall-via-help support for all platforms supported by
the runtime. In this new mechanism the JIT asks the runtime for help
whenever it realizes it will need a helper to perform a tailcall, i.e.
when it sees an explicit tail. prefixed call that it cannot make into a
fast jump-based tailcall.

The runtime creates two important IL stubs to help the JIT in performing
the necessary tailcalls. One IL stub is used to store the args for the
tailcall while the other is used to dispatch the actual tailcall
itself. The JIT will then transform the call from

return tail. F(a1, ..., an);

to

IL_STUB_StoreTailCallArgs(a1, ..., an);
T result;
IL_STUB_DispatchTailCalls(..., &result);
return result;

The dispatcher is written in such a way that it is able to dispatch
multiple tailcalls in a row when tailcalled functions also perform
tailcalls. To do this, the JIT helps the dispatcher detect if the
caller's caller is also a dispatcher. When this is the case the
dispatcher returns to let the previous dispatcher perform the tailcall
with the currently stored args. This allows the frame to unwind and
ensures that sequences of tailcalls do not grow the stack more than by a
constant factor.

Due to this unwinding the args cannot be stored on the stack and are
instead stored in TLS. The GC is made specially aware of this buffer as
the args can be anything, including interior pointers.

The control-flow when performing the new tailcalls is nonstandard, so
this also changes the debugger to support proper stepping into/over/out
of tailcalled functions when they go through the new dispatcher.

x86's tailcalling mechanism does not change.

Port of dotnet/coreclr#26418.

WIP:

  • Stubs
    • EH clause
  • Initial JIT work
  • ReturnAddress intrinsics for detecting previous frames
    • XArch
    • ARM
    • Compatibility with return address hijacking
  • Void returning methods
  • Instance methods
    • Classes
    • Structs
  • Retbuf
  • Structs returned in registers
  • GC support
  • Virtual methods
    • Non-VSD
    • VSD
  • Generics
    • Virtual
    • Non-virtual
  • tail. calli
    • Evaluation order (target address must be evaluated last)
  • Stop relying on fast tailcall support (makes x86 support difficult/downright impossible)
  • ByRef support (validate GC tracking)
  • Initial bringup of other platforms
    • x64 Linux
    • x64 Windows
    • ARM32 Linux
    • ARM64 Linux
  • Localloc
  • Debugging
  • Use old mechanism on x86 (as it is fast there)
  • SuperPMI changes for interface
  • Better test coverage of tailcall via helpers
  • Change to use similar code paths as ldvirtftn/ldftn
    • Expand GT_RUNTIMELOOKUP so it can be used from morph
  • Varargs
  • Revise documentation
  • Tailcall IL stub sharing
  • R2R
  • Investigate unloadability
  • Investigate tailcalls to functions with non-standard args
  • Improve arg buffer allocation policy

FYI @dotnet/jit-contrib @janvorli

This implements tailcall-via-help support for all platforms supported by
the runtime. In this new mechanism the JIT asks the runtime for help
whenever it realizes it will need a helper to perform a tailcall, i.e.
when it sees an explicit tail. prefixed call that it cannot make into a
fast jump-based tailcall.

The runtime created two important IL stubs to help the JIT in performing
the necessary tailcalls. One IL stub is used to store the args for the
tailcall, while the other is used to dispatch the actual tailcall
itself. The JIT will then transform the call from

return tail. F(a1, ..., an);

to

IL_STUB_StoreTailCallArgs(a1, ..., an);
T result;
IL_STUB_DispatchTailCalls(..., &result);
return result;

The dispatcher is written in such a way that it is able to dispatch
multiple tailcalls in a row when tailcalled functions also perform
tailcalls. To do this, the JIT helps the dispatcher detect if the
caller's caller is also a dispatcher. When this is the case the
dispatcher returns to let the previous dispatcher perform the tailcall
with the currently stored args. This allows the frame to unwind and
ensures that sequences of tailcalls do not grow the stack more than by a
constant factor.

Due to this unwinding the args cannot be stored on the stack and are
instead stored in TLS. The GC is made specially of this buffer as the
args can be anything, including interior pointers.

The control-flow when performing the new tailcalls is nonstandard, so
this also changes the debugger to support proper stepping into/over/out
of tailcalled functions when they go through the new dispatcher.

x86's tailcalling mechanism does not change.
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 27, 2019
@jashook
Copy link
Contributor

jashook commented Nov 27, 2019

Thank you for porting!

Copy link
Contributor

@jashook jashook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mostly nits and asks for more asserts. Overall to me the change looks good. Thank you for the work!

src/coreclr/src/debug/ee/controller.cpp Show resolved Hide resolved
src/coreclr/src/jit/gentree.h Show resolved Hide resolved
src/coreclr/src/jit/lower.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
#endif

JITDUMP("fgMorphTailCallViaHelper (before):\n");
JITDUMP("fgMorphTailCallViaHelpers (before):\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method in general makes interestingly scary changes to the argument list. I think in general this method should have many asserts. Any time the arg list changes there should be an assert that the rest of the arglist is unaffected. Also there should be entry and exit asserts checking that the arglist is valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have suggestions on how to check those conditions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't think that far ahead sorry, I spoke with @BruceForstall and he and I didn't have a clear idea on how to do this today. Would you mind adding TODO: assert check this arg is the correct arg type to remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can address this post-merge.

src/coreclr/src/jit/morph.cpp Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Show resolved Hide resolved
@jashook
Copy link
Contributor

jashook commented Nov 27, 2019

Before this goes in we should do a round of asm diffs.

@jakobbotsch
Copy link
Member Author

Thanks for the review @jashook. I still have this as WIP as I believe there are some things left to do, eg. you have pointed some of them out in the reviews, and the runtime lookup done in morph needs to be investigated carefully. I don't really have the expertise to sign off on that change as it stands.

src/coreclr/src/vm/dllimport.h Outdated Show resolved Hide resolved
src/coreclr/src/vm/jitinterface.cpp Outdated Show resolved Hide resolved
src/coreclr/src/vm/stubgen.h Show resolved Hide resolved
src/coreclr/src/vm/tailcallhelp.cpp Outdated Show resolved Hide resolved
@BruceForstall
Copy link
Member

Does the design document https://github.com/dotnet/runtime/blob/master/docs/design/features/tailcalls-with-helpers.md need to be updated to match this implementation? (Either with this change, or after it is merged?)

@jakobbotsch
Copy link
Member Author

@BruceForstall Yes, I have an updated document I will add soon.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only reviewed the JIT sources so far, and I mostly have questions or requests for additional comments.

src/coreclr/src/jit/compiler.h Outdated Show resolved Hide resolved
src/coreclr/src/jit/compiler.h Show resolved Hide resolved
src/coreclr/src/jit/emit.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/gentree.h Show resolved Hide resolved
src/coreclr/src/jit/importer.cpp Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
AndyAyersMS added a commit that referenced this pull request Jan 15, 2020
Make STRESS_GENERIC_VARN more compatible with methods that make explicit tail
calls. Don't add gc checks for explicit tail calls, and remove the code in
morph that blocks tail calls if gc checks are active.

Update the tailcall test to to disable STRESS_UNSAFE_BUFFER_CHECKS. This can
be reverted when #341 is merged.

Fixes #1752.
@erozenfeld
Copy link
Member

@jakobbotsch I'd like to help get this PR to a point we can merge it. You mentioned in a few comments above that you were going to do some experiments and make some changes. Do you have anything in progress? Can you add an updated document? Let me know what you think is the best way to get this ready for a merge. We don't want this work to bit-rot. I have cycles to help.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Feb 5, 2020

@erozenfeld Sounds great. Unfortunately I have not had time to experiment further with this.
Also, as you can see, there is some unaddressed feedback from the team.

I have just added the updated design doc so you can take a look at that for a description of how it works. Also, I am happy to answer any questions you might have.

Let me know what you think is the best way to get this ready for a merge.

Essentially the missing part is dealing with tailcalls to generic functions which require generic contexts (or maybe even other nonstandard arguments). My idea was to pass a callable function pointer to the IL stubs. Such a function pointer can be obtained by doing something like ldftn or ldvirtftn, which will create instantiating stubs when a generic context parameter is required.
The problem is that we are in the JIT, more specifically in morph when we figure out that such a callable function pointer will be needed. At this point it is hard to create the complex trees required for runtime lookups which the JIT normally does directly during import (of eg. ldvirtftn).

My idea was to repurpose GT_RUNTIMELOOKUP nodes to hold information about runtime lookups. Then, change import and tailcall morph to generate such nodes when needed, eg. for ldvirtftn or for tailcalls requiring the same. Finally, expand these nodes later in the JIT, eg. during rationalization or lowering. This might have some impact on CSE which is what I wanted to experiment with.

There is some more discussion in the original thread, starting at dotnet/coreclr#26418 (comment). Once again, let me know if you have any questions about the design.

@erozenfeld
Copy link
Member

@jakobbotsch Can you please rebase the changes? I plan to work on getting us closer to merge this week.

@AaronRobinsonMSFT AaronRobinsonMSFT removed their request for review March 3, 2020 02:48
@jakobbotsch
Copy link
Member Author

@erozenfeld I addressed some feedback and resolved the conflicts. I merged instead of a rebase to make it a bit easier to check the merge resolutions, we can squash it later.

@erozenfeld erozenfeld merged commit e1ffadd into dotnet:master Apr 28, 2020
@BruceForstall
Copy link
Member

Congratulations!

Many, many thanks to @jakobbotsch for the implementation (and design updating) work, @janvorli for the original design, @jashook for mentoring, and @erozenfeld for doing the hard work to get it over the line. It's really great to see this get in!

@erozenfeld
Copy link
Member

Also thanks to @CarolEidt , @AndyAyersMS and @jkotas for code reviews and feedback along the way!

I opened two follow-up issues: #35551 (Jit: move runtime lookup expansion to lower) and #35423 (Add support for R2R tail calls with helpers in crossgen/crossgen2).

@jakobbotsch jakobbotsch deleted the portable-tailcall-helpers branch April 28, 2020 08:27
@jakobbotsch
Copy link
Member Author

Thanks to all for the help, and a special thanks to @erozenfeld for pushing this past the finishing line!

I also opened #35559 about moving the dispatcher to System.Private.CoreLib once we have function pointer support.

@janvorli
Copy link
Member

Awesome, I am very happy to see this going in!

jkotas added a commit that referenced this pull request Apr 28, 2020
erozenfeld added a commit to erozenfeld/runtime that referenced this pull request Apr 29, 2020
compFastTailCalls is true if COMPlus_FastTailCalls is not 0.
It was introduced in dotnet#341 to help with testing:
setting COMPlus_FastTailCalls to 0 forces all tail-prefixed
calls to be dispatched via helpers.

This fixes a subtle bug with checking compFastTailCalls:
it needs to be checked after `fgInitargInfo` has been called in
`fgCanFastTailCall`. `fgInitArgInfo` adds the stub address for VSD calls
and `fgMorphTailCallViaHelpers` then removes it.

This change also factors out the logic for checking whether the call
has byref parameters that must be copied by the caller.
erozenfeld added a commit that referenced this pull request Apr 30, 2020
compFastTailCalls is true if COMPlus_FastTailCalls is not 0.
It was introduced in #341 to help with testing:
setting COMPlus_FastTailCalls to 0 forces all tail-prefixed
calls to be dispatched via helpers.

This fixes a subtle bug with checking compFastTailCalls:
it needs to be checked after `fgInitargInfo` has been called in
`fgCanFastTailCall`. `fgInitArgInfo` adds the stub address for VSD calls
and `fgMorphTailCallViaHelpers` then removes it.

This change also factors out the logic for checking whether the call
has byref parameters that must be copied by the caller.
erozenfeld added a commit to erozenfeld/runtime that referenced this pull request May 1, 2020
The old-style helper tail calls required the jit to copy implicit by-ref
args. After dotnet#341 we are using old-style helper tail calls only for x86,
which doesn't have implicit by-ref parameters. So the check is no longer
necessary.

This is a no-diffs cleanup change.
erozenfeld added a commit that referenced this pull request May 1, 2020
The old-style helper tail calls required the jit to copy implicit by-ref
args. After #341 we are using old-style helper tail calls only for x86,
which doesn't have implicit by-ref parameters. So the check is no longer
necessary.

This is a no-diffs cleanup change.
@erozenfeld
Copy link
Member

@jakobbotsch If you still have slides from your presentation about this work, can you send them to me?

@jakobbotsch
Copy link
Member Author

@erozenfeld Unfortunately I seem to have kept it on my Microsoft OneDrive which I do not have access to anymore.

@janvorli
Copy link
Member

janvorli commented May 5, 2020

@erozenfeld there is the video from that presentation available if that's sufficient for you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants