-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up generation of load and store sequences for Power #5630
Comments
Could you further explain the need for |
@fjeremic The reason that nodes need to be tracked here is that some of the nodes' registers may end up being used in the
Assuming that node [2] is only used here, if node [3] ends up in Previously, we handled this by keeping track of a "base node" and an "index node" in |
@aviansie-ben check |
@fjeremic I see how that can resolve the problem, but I'm not sure I like that as a solution. Artificially increasing the reference count and then scheduling it to be decremented later seems like a bit of a hack to me and requires quite a bit of additional bookkeeping, although I have to admit it may be better from the evaluator's point of view to not have to worry about stuff like this. That being said, I don't think the benefits of a solution like that really apply once an interface like |
Will we be enforcing this by removing such constructors/factory methods on |
Sorry, I should have been more clear about this in my original proposal. The eventual goal is to remove these APIs on |
Awesome. I support your efforts then! I'll take a look once the dust settles on your eventual PR and if all looks well we can migrate the Z codegen to a similar solution as I do agree the artificial mucking with reference counts is not the greatest solution. |
@gita-omr @zl-wang Just a quick FYI on this, since it's aiming to fix and prevent a family of volatility related bugs I've found in the last week or so where the |
@aviansie-ben could you please add @zl-wang and myself as reviewers? We would certainly like to take a look and discuss. |
@gita-omr There is no PR associated with this at the moment. This is simply an issue for discussing high-level design of the new API for performing these sorts of optimizations in a safe manner. I hope to have a WIP PR open by the end of the day on Monday, and I'll certainly add you as reviewers on that PR once it's open. |
I've now opened #5652 as a WIP PR for the OMR side of this work. These changes still need more testing and there are still many areas in OpenJ9 that need to be updated to use the new API. |
@aviansie-ben would it be possible to add examples of how the new interface would be used in (1) the basic case (2) the case in #5630 (comment) |
Nvm, found examples in the PR. |
Currently, the generation of load and store sequences is deeply entangled with
TR::MemoryReference
and has code copied in numerous different locations. Specifically, we are currently able to generate a memory reference for an arbitrary load/store node from any evaluator and then emit instructions manually for performing the load/store. For instance, loads on Power are implemented like the following (simplified for convenience):However, code for emitting loads in particular is copied all over the codegen, as it is common to optimize when a child is a single-reference load by changing the opcode used to perform the load (e.g. an
ibyteswap
of aniload
can be done in a singlelwbrx
instruction). This has resulted in a number of bugs and design issues that have proliferated uncontrolled throughout the Power codegen:lwsync
instruction for volatile loads, which can result in nondeterministic and unreproducible bugs (I've found at least a dozen locations that have this problem so far)TR::MemoryReference
to keep track of nodes whose reference counts need to be decremented, unnecessarily coupling that class to tree evaluationTR::MemoryReference
to be tightly coupled to how unresolved symbol references work in OpenJ9TR::MemoryReference
, making that class unnecessarily complicated when it should be quite simple in theoryTo address these issues, I'd like to propose adding a new extensible class called
LoadStoreHandler
to the Power codegen that will handle these operations in a safe and controlled manner (this is a slightly simplified version to eliminate some unnecessary implementation details regarding 64-bit loads/stores on 32-bit systems):The idea here is to provide a simple interface for the evaluators to use that makes it much harder to use incorrectly and hides unnecessary details from the user. For most use cases, a simple call to
generateLoadNodeSequence
orgenerateStoreNodeSequence
can be used to encompass the entire operation of loading/storing with a particular opcode. This makes it completely impossible to forget to emit memory barriers when required, as the fact that this is even happening at all is hidden from the user entirely.For more advanced use cases,
generateComputeAddressSequence
can be used to compute the effective address of a load/store into a register. This register can later be passed togenerateLoadAddressSequence
orgenerateStoreAddressSequence
to actually produce the load/store and corresponding barriers. While potentially more error-prone, this is required for cases where special code (e.g. write barrier handling) must run between when the child nodes are evaluated and when the memory operation is actually performed.In order to make extension of these methods by downstream projects easier without needing to copy the OMR definitions of the methods on
LoadStoreHelper
, the OMR implementations would delegate to some methods from another extensible class that defines internal implementation details and isn't meant to be used directly in tree evaluators:Doing this has a couple of notable benefits. For starters, generation of the sequences for loads and stores is centralized into one function which uses an arbitrary
TR::MemoryReference
without needing to expose that detail to the evaluators. This allows both the address-based and node-based methods to be overridden by a downstream project simultaneously without requiring such code to be duplicated and allows downstream projects' implementations of these methods to delegate to the OMR versions for simple cases without needing to copy code.Furthermore, the introduction of
NodeMemoryReference
as an abstract representation of a memory reference alongside a number of nodes whose registers are being used in the memory reference allowsTR::MemoryReference
to no longer have to worry about keeping track of nodes itself.For now, these functions would all be implemented by delegating to the existing methods on
TR::MemoryReference
that we already use in order to avoid breaking changes for the time being. However, significant amounts of code could be moved out ofTR::MemoryReference
after an interface like this is introduced without breaking evaluators.The text was updated successfully, but these errors were encountered: