Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metronome GC to work within safepoint VM access #18535

Closed
wants to merge 3 commits into from

Conversation

gacholio
Copy link
Contributor

Allow the metronome GC to function correctly when the calling thread already has safepoint exclusive VM access.

Fixes: #18482

@gacholio
Copy link
Contributor Author

This may not be quite sufficient to solve all the issues (in particular, if safepoint is disabled, I'm not sure this code will work - the GC call will be from inside a normal exclusive VM access).

Testing both states does not work because it's not possible to distinguish between the failing case here or the normal metronome case when release exclusive (the exclusive VM access state is the same in both those cases).

The VM default to safepoint active, and the only reason to disable it would be to work around an HCR bug, so I don't think it's particularly important that the non-safepoint case works in the short term). @tajila Contradict me if I'm wrong.

If we want to fix both cases, I would introduce a new flag on the calling thread that indicates that exclusive is already held and test that instead of the safepoint state.

Allow the metronome GC to function correctly when the calling thread
already has safepoint exclusive VM access.

Fixes: eclipse-openj9#18482

Signed-off-by: Graham Chapman <[email protected]>
*/
_vmResponsesRequiredForExclusiveVMAccess = 0x7fffffff;
_jniResponsesRequiredForExclusiveVMAccess = 0x7fffffff;
--(mainGCThread->omrVMThread->exclusiveCount);
Copy link
Contributor

@amicic amicic Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to remove the line from before the 'if' check?

I was under impression that you wanted to move the decrement inside the if, but I can see this decrement was just brought in by rleaseExclusiveVMAccessMetronome that is inlined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why there's a double decrement in this case, but as you say, it's just inlining the other function and excluding the actual exclusive release if exclusive was held in the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the use of the NoMutex version of the release VM access call is likely incorrect, but it's what the old code was doing.

@gacholio
Copy link
Contributor Author

While I'm technically on vacation now, I have no problem implementing the more correct solution of the extra VM field (or bit an existing flags field) indicating that exclusive was acquired before calling the GC. That would almost certainly be done before the end of the week.

@amicic
Copy link
Contributor

amicic commented Nov 30, 2023

I'm surprised that checking safePointState is safe without holding exclusiveAccessMutex.

Also, I'm not convinced these checks should be on GC side.

@gacholio
Copy link
Contributor Author

Also, I'm not convinced these checks should be on GC side.

The line between VM and GC code for metronome is awfully blurry (in the old code, the GC increments the exclusive counter, and the VM code decrements it, which is one of the reasons I inlined the release function).

@gacholio
Copy link
Contributor Author

I'm surprised that checking safePointState is safe without holding exclusiveAccessMutex.

I take your point, though if exclusive is held, it's necessarily on the running thread (though metronome plays fast and loose with that assumption) - the change to use the global "exclusive is held" flag would fix any confusion here, I think.

@gacholio
Copy link
Contributor Author

gacholio commented Nov 30, 2023

Also, I'm not convinced these checks should be on GC side.

The line between VM and GC code for metronome is awfully blurry (in the old code, the GC increments the exclusive counter, and the VM code decrements it, which is one of the reasons I inlined the release function).

For the most part, the checks could move into the called code in the VM (since the GC checks almost exclusively wrap a single VM call). The asymmetry of the code can be fixed later, if at all.

@amicic
Copy link
Contributor

amicic commented Nov 30, 2023

The line between VM and GC code for metronome is awfully blurry (in the old code, the GC increments the exclusive counter, and the VM code decrements it, which is one of the reasons I inlined the release function).

Not sure about other asymmetry, but exclusiveCounter updates are symmetric:

There is a GC inc/dec pair in MetronomeDelegate::waitForExclusiveVMAccess/releaseExclusiveVMAccess
and there is a VM pair in waitForExclusiveVMAccessMetronome(Temp)/releaseExclusiveVMAccessMetronome

The latter/VM ones are the right ones (regular acquire/release exclusive path have them, too).

The former/GC ones are unclear if really necessary. Perhaps for debugging purposes. It may also have to do something with a case when RAS calls explicit GC and already holds exclusive, in which case GC does not end up calling the VM wait/release.

@amicic
Copy link
Contributor

amicic commented Nov 30, 2023

Testing both states does not work because it's not possible to distinguish between the failing case here or the normal metronome case when release exclusive (the exclusive VM access state is the same in both those cases).

Does this comment relate to this attempt: master...gacholio:openj9:check ?

I feel that semantically this is the right approach, but I guess because we set exclusive state directly to EXCLUSIVE from the request we have ambiguity problems.... If we would to set the state to PENDING from the request and to EXCLUSIVE later after the wait is complete, and check specifically against PENDING in the sync code, perhaps that would help?

@gacholio
Copy link
Contributor Author

Does this comment relate to this attempt: master...gacholio:openj9:check ?

Yes.

I feel that semantically this is the right approach, but I guess because we set exclusive state directly to EXCLUSIVE from the request we have ambiguity problems.... If we would to set the state to PENDING from the request and to EXCLUSIVE later after the wait is complete, and check specifically against PENDING in the sync code, perhaps that would help?

I'd rather not introduce more complexity here.

I'll move the checks from the GC to the VM and then probably change the use the more generic "holding exclusive for HCR" flag in the VM.

@gacholio
Copy link
Contributor Author

gacholio commented Nov 30, 2023

Not sure about other asymmetry

I'm referring to the fact that the GC increments the counter, but the decrement is in called VM code. I will not change that in this PR (in the future, we should consider inlining releaseExclusiveVMAccessMetronome for clarity).

@gacholio gacholio marked this pull request as draft November 30, 2023 15:28
Signed-off-by: Graham Chapman <[email protected]>
@gacholio gacholio force-pushed the delegate branch 2 times, most recently from b12aaa9 to bced9c0 Compare November 30, 2023 16:31
Signed-off-by: Graham Chapman <[email protected]>
@gacholio
Copy link
Contributor Author

Abandoning in favour of #18540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JDK21 cmdLineTester_jvmtitests_hcr_openi9_none_SCC_7_FAILED rc018 testReflectRedefineAtSameTime
2 participants