-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: forEachP not done and stopTheWorld not stopped on openbsd/ppc64 #63384
Comments
Change https://go.dev/cl/532855 mentions this issue: |
I wonder if the problem is that
In the ppc64 implementation that we use, if There is no synchronization between the call to There is no corresponding issue with the futex implementation, because that doesn't use I checked GCC and it executes an |
I forgot to say: using an atomic store in |
I agree that something seems strange here. It is odd that both AIX or openbsd fails, surely the memory holding the pointer meets the ISA requirements for lowering isync to lwsync? Note, the ISA claims |
@4a6f656c Is there a way for you to see whether https://go.dev/cl/533118 fixes the problem? Thanks. |
Change https://go.dev/cl/533118 mentions this issue: |
After further investigation it's possible that the use of LWSYNC instead of ISYNC is the problem. I see that the code was changed from ISYNC to LWSYNC in https://go-review.googlesource.com/c/go/+/95175 which was before the AIX issue was written. Perhaps the ISA description this change was based on was not clear and misunderstood. |
As far as I can tell, we need some kind of memory synchronization on the failure path of I could certainly be mistaken, but where? |
It seems odd that I don't think |
The question is, is this the correct usage for CAS? I was always under the impression that CAS is used when all threads were accessing the data with CAS. But I don't see any documentation that officially states that. |
@pmur It seems clear that the code reading If that is not the case—and I don't think it's my call, it's up to @golang/runtime—then we need to document that clearly in the docs for @laboger To be clear, all the threads are accessing the data in the |
I was actually referring to noteclear since that appears to be where it is not doing an atomic store, and that is where they did the fix for AIX. |
@laboger I see. I think that |
After discussion with the runtime team we think that it's clearer if compare-and-swap requires memory consistency even in the failure case. So I'm going to submit my CL. I'm going to optimistically assume that it fixes this problem. |
By the way, I'll continue to use LWSYNC for now. If y'all think this should change to ISYNC, by all means go ahead. Thanks. |
Opened #63506 for the similar issue on arm and mips. |
Change https://go.dev/cl/534517 mentions this issue: |
The issue is pretty difficult to trigger, however I've managed to get 50+ runs of |
Thanks for trying it. |
In CL 163624 we added an atomic store in noteclear on AIX only. In the discussion on issue #63384 we think we figured out that the real problem was in the implementation of compare-and-swap on ppc64. That is fixed by CL 533118, so the atomic store is no longer required. For #30189 For #63384 Change-Id: I60f4f2fac75106f2bee51a8d9663259dcde2029c Reviewed-on: https://go-review.googlesource.com/c/go/+/534517 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Joel Sing <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
This CL changes ppc64 atomic compare-and-swap (cas). Before this CL, if the cas failed--if the value in memory was not the value expected by the cas call--the atomic function would not synchronize memory. In the note code in runtime/lock_sema.go, used on BSD systems, notesleep and notetsleep first try a cas on the key. If that cas fails, something has already called notewakeup, and the sleep completes. However, because the cas did not synchronize memory on failure, this meant that notesleep/notetsleep could return to a core that was unable to see the memory changes that the notewakeup was reporting. Fixes golang#30189 Fixes golang#63384 Change-Id: I9b921de5c1c09b10a37df6b3206b9003c3f32986 Reviewed-on: https://go-review.googlesource.com/c/go/+/533118 Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Paul Murphy <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Lynn Boger <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
In CL 163624 we added an atomic store in noteclear on AIX only. In the discussion on issue golang#63384 we think we figured out that the real problem was in the implementation of compare-and-swap on ppc64. That is fixed by CL 533118, so the atomic store is no longer required. For golang#30189 For golang#63384 Change-Id: I60f4f2fac75106f2bee51a8d9663259dcde2029c Reviewed-on: https://go-review.googlesource.com/c/go/+/534517 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Joel Sing <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
The openbsd/ppc64 port occasionally fails with errors like the following:
fatal error: forEachP: not done
https://build.golang.org/log/20e882c562bc987948d25c42db4c8d1437a455b2
Or:
fatal error: stopTheWorld: not stopped (stopwait != 0)
https://build.golang.org/log/186564e55652471a794fd506989d98781bec30eb
This appears to be identical to the issue #30189, which related to aix/ppc64.
This was resolved via https://go-review.googlesource.com/c/go/+/163624 - presumably the same change is needed.
The text was updated successfully, but these errors were encountered: