Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Templated JIT compiler #9582

Open
Scott-Young-6746 opened this issue May 15, 2020 · 11 comments
Open

Feature Request: Templated JIT compiler #9582

Scott-Young-6746 opened this issue May 15, 2020 · 11 comments

Comments

@Scott-Young-6746
Copy link

Scott-Young-6746 commented May 15, 2020

TR is an optimizing compiler that can perform numerous optimizations (with supporting data structures) and uses a sophisticated intermediary language.

This means that its threshold for compilation must be chosen to avoid over-eager compilation. The supporting data structures also increase the memory footprint of a compilation.

A templated JIT compiler would allow for compilation to begin earlier and would require less memory to perform compilations, making it a suitable compiler for memory-constrained environment.

If the templated JIT compiler emits profiling instructions and does not prevent TR from compiling, it could also be used to improve start-up time.

A templated JIT compiler that uses a stack-allocated local array and computation stack would be relatively quick to compile bytecodes and develop for.

Any templated JIT will need to reuse as much as it can from TRs infrastructure to improve maintainability.

A prototype might not need to initially support On-stack replacement or full speed debugging, but should be designed such that it is easy to add such features in the future.

A templated JIT will not likely need to save its code for for reuse in ahead-of-time compilation, due to the unoptimized nature of its output. More importantly, it should not override any code that is loaded from ahead-of-time compilation caches, as those code caches will likely contain more efficient code.

The stack might look something like the following on X86-64:

    /*
     * MJIT stack frames must conform to TR stack frame shapes. However, so long as we are able to have our frames walked, we should be
     * free to allocate more things on the stack before the required data which is used for walking. The following is the general shape of the
     * initial MJIT stack frame
     * +-----------------------------+
     * |             ...             |
     * | Parameters passed by caller | <-- The space for saving params passed in registers is also here
     * |             ...             |
     * +-----------------------------+
     * |        Return Address       | <-- RSP points here before we allocate space for templated JIT
     * +-----------------------------+ <-- Caller/Callee stack boundary. (caller is template JITed)
     * |             ...             |
     * |     Preserved Registers     |
     * |             ...             |
     * +-----------------------------+
     * |             ...             |
     * |       Local Variables       |
     * |             ...             | <-- R10 points here at the end of the prologue, R14 always points here.
     * +-----------------------------+ 
     * |             ...             |
     * |      Computation Stack      | <-- sub from R10 to allocate space on computation stack
     * |             ...             |
     * +-----------------------------+ <-- End of the stack frame for leaf methods
     * |             ...             |
     * | Parameters passed to callee |
     * |             ...             |
     * +-----------------------------+
     * |        Return Address       | <-- Caller is templated JITEd
     * +-----------------------------+ <-- End of the stack frame for non-leaf methods
     */
@andrewcraik
Copy link
Contributor

andrewcraik commented May 19, 2020

This lacks details of how the microjit would be integrated into OpenJ9, how its data structures would interoperate (or not) with OpenJ9, how it would interact with the shared class cache, interpreter, GC etc. What GC modes does it support? What is the delivery plan? I think this needs quite a lot of expansion and probably discussion on a community call to present the proposal.

@Scott-Young-6746
Copy link
Author

This lacks details of how the microjit would be integrated into OpenJ9, how its data structures would interoperate (or not) with OpenJ9, how it would interact with the shared class cache, interpreter, GC etc. What GC modes does it support? What is the delivery plan? I think this needs quite a lot of expansion and probably discussion on a community call to present the proposal.

That's a good idea! Can you point me to a schedule for community calls?

@DanHeidinga
Copy link
Member

Calls are Wednesday @ 11am ET. The agenda is posted to the #planning channel usually the day before or the morning of the call. Link to join slack

@Scott-Young-6746
Copy link
Author

Calls are Wednesday @ 11am ET. The agenda is posted to the #planning channel usually the day before or the morning of the call. Link to join slack

Thank you!

@mpirvu
Copy link
Contributor

mpirvu commented May 20, 2020

I don't foresee any problems regarding the interaction between MicroJIT and AOT. If a method has AOT code in SCC, we would load that AOT body because relocations are probably faster than generating code with MicroJIT and the code quality is expected to be better. Upgrades for the AOT code will be done with the JIT compiler as usual. Also, I don't envision a solution where MicroJIT generates relocatable code.

With respect to OSR, since MicroJIT is not doing any optimization, do we have to do anything special?

@DanHeidinga
Copy link
Member

With respect to OSR, since MicroJIT is not doing any optimization, do we have to do anything special?

I had a similar question about HCR (hot code replace) given this is a template jit which mimics the bytecode action. Does it fetch the data - J9Method *'s, J9Class *'s, etc - from the constantpool every time the code is run? Does it hardcode those values into the compiled code?

@mpirvu
Copy link
Contributor

mpirvu commented May 20, 2020

Does it fetch the data - J9Method *'s, J9Class *'s, etc - from the constantpool every time the code is run?

I would assume yes. In my mind Microjit can be viewed as a faster interpreter.

@Scott-Young-6746
Copy link
Author

Scott-Young-6746 commented May 20, 2020

Does it fetch the data - J9Method *'s, J9Class *'s, etc - from the constantpool every time the code is run?

I would assume yes. In my mind Microjit can be viewed as a faster interpreter.

When implementing getstatic and putstatic Younes gave me a handle on how to resolve static things at compile time. We've been hard coding those values at the moment, but we also, due to how we support it, only support compile-time resolvable static variables.

For future support of things that cannot be resolved at compile-time, some mechanism that looks into the J9Method *'s, J9Class *'s, ett., at run-time.

@andrewcraik
Copy link
Contributor

In general resolution in compiled code doesn't make a lot of sense to me - it would make more sense to jump to the interpreter and potentially recompile the method (even with the template compiler). TR has gone to great lengths to support unresolved code, but those have in general been big time syncs, defect prone, and unlikely to yield a lot of performance. Just giving up and recompiling greatly simplifies a lot of things IMO.

@andrewcraik
Copy link
Contributor

With respect to OSR, since MicroJIT is not doing any optimization, do we have to do anything special?

The compiled code would need to support generating the required metadata, populating the transition buffer etc. Doing this should be easier for the template compiler than for TR but it will need some work to support. I think this kind of support could allow the template compiler to give up if things get too complicated on a given code path which might make handling of unresolved or uninitialized stuff easier.

@Scott-Young-6746
Copy link
Author

With respect to OSR, since MicroJIT is not doing any optimization, do we have to do anything special?

The compiled code would need to support generating the required metadata, populating the transition buffer etc. Doing this should be easier for the template compiler than for TR but it will need some work to support. I think this kind of support could allow the template compiler to give up if things get too complicated on a given code path which might make handling of unresolved or uninitialized stuff easier.

I'm only familiar with OSR in a theoretical sense. What kind of meta-data does it require?

In general resolution in compiled code doesn't make a lot of sense to me - it would make more sense to jump to the interpreter and potentially recompile the method (even with the template compiler). TR has gone to great lengths to support unresolved code, but those have in general been big time syncs, defect prone, and unlikely to yield a lot of performance. Just giving up and recompiling greatly simplifies a lot of things IMO.

I believe our current exception for failed compilation with MicroJIT causes the extra2 field to be set with a "do not compile" flag. I'll have to look into creating another failed compilation that allows re-attempts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants