-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: lock ordering problem
between trace
and wbufSpans
#56554
Comments
I still plan to look into this. |
Oh no. This is real. You can't actually have a write barrier while holding |
It seems the write barrier is https://cs.opensource.google/go/go/+/release-branch.go1.20:src/runtime/trace.go;l=535 (Note: I used 1.20 release branch to point to old code). That code is gone with your CL https://golang.org/cl/496296 . So it is probably fixed. |
Thanks for digging it up. I tried the 1.20 release branch but couldn't find it; probably a misclick and I was on the wrong branch in the end. Anyway, yes! Indeed, looks like this problem is solved on tip. |
Found new dashboard test flakes for:
2023-10-25 19:47 linux-amd64-staticlockranking go@a57c5736 runtime/trace.TestTraceFutileWakeup (log)
|
I think this is fixed as of the new tracer, and a bunch of the recent staticlockranking fixes for it. |
Found new dashboard test flakes for:
2024-01-23 00:14 linux-amd64-staticlockranking go@4605ce2d os/signal (log)
|
Found new dashboard test flakes for:
2024-01-25 09:18 gotip-linux-amd64-staticlockranking go@cad66291 os/signal.TestSignalTrace (log)
|
I think I we just need to be more careful about write barriers deep down in this code. I don't think that should be too hard to do. I'll look into it. |
Change https://go.dev/cl/560216 mentions this issue: |
Change https://go.dev/cl/559957 mentions this issue: |
Currently the trace map is cleared with an assignment, but this ends up invoking write barriers. Theoretically, write barriers could try to write a trace event and eventually try to acquire the same lock. The static lock ranking expresses this constraint. This change replaces the assignment with a call to memclrNoHeapPointer to clear the map, removing the write barriers. Note that technically this problem is purely theoretical. The way the trace maps are used today is such that reset is only ever called when the tracer is no longer writing events that could emit data into a map. Furthermore, reset is never called from an event-writing context. Therefore another way to resolve this is to simply not hold the trace map lock over the reset operation. However, this makes the trace map implementation less robust because it needs to be used in a very specific way. Furthermore, the rest of the trace map code avoids write barriers already since its internal structures are all notinheap, so it's actually more consistent to just avoid write barriers in the reset method. Fixes #56554. Change-Id: Icd86472e75e25161b2c10c1c8aaae2c2fed4f67f Reviewed-on: https://go-review.googlesource.com/c/go/+/560216 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> (cherry picked from commit 829f2ce) Reviewed-on: https://go-review.googlesource.com/c/go/+/559957 Auto-Submit: Michael Knyszek <[email protected]>
Currently the trace map is cleared with an assignment, but this ends up invoking write barriers. Theoretically, write barriers could try to write a trace event and eventually try to acquire the same lock. The static lock ranking expresses this constraint. This change replaces the assignment with a call to memclrNoHeapPointer to clear the map, removing the write barriers. Note that technically this problem is purely theoretical. The way the trace maps are used today is such that reset is only ever called when the tracer is no longer writing events that could emit data into a map. Furthermore, reset is never called from an event-writing context. Therefore another way to resolve this is to simply not hold the trace map lock over the reset operation. However, this makes the trace map implementation less robust because it needs to be used in a very specific way. Furthermore, the rest of the trace map code avoids write barriers already since its internal structures are all notinheap, so it's actually more consistent to just avoid write barriers in the reset method. Fixes golang#56554. Change-Id: Icd86472e75e25161b2c10c1c8aaae2c2fed4f67f Reviewed-on: https://go-review.googlesource.com/c/go/+/560216 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
https://build.golang.org/log/af083d40156b011db2e0cadd6040692cf579967c:
(attn @golang/runtime)
The text was updated successfully, but these errors were encountered: