-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: mysterious bit set of a high order bit causes segfaults and bounds check failures #64290
Comments
CC @mknyszek |
Interesting, it's segfaulting on the type pointer. That pointer also doesn't look like it's nil, either, which makes this very odd. So far it looks like this has only failed on linux-ppc64? Is that right? |
@mknyszek that's the only failure I've observed, but it might be worth checking with greplogs. |
Hey @golang/ppc64, is there something special about where the type metadata (from the binary) is placed in the address space? i.e. does |
@findleyr Thanks. |
I'd err on the side of bogus. I think mapped memory usually falls below |
There was another failure, this time unrelated to typePointersOfUnchecked, with a similar segfault on a similar address. This makes me think it's more general memory corruption. https://build.golang.org/log/cbb742b677c36b578951d0a9b5589c1dc223a1c1 |
The high bit that seems to be set is the 60th bit? Why does that sound familiar... |
It looks like #64294 is related. There's a weird integer found in a slice that also seems to have some high bit set. (134217730 is 0x8000002 in hex.) After letting this soak for a bit, I'm starting to become convinced this is a ppc64-specific issue with high bits being set. Maybe a subtle miscompilation somewhere? |
Found new dashboard test flakes for:
2023-11-01 07:10 linux-ppc64-sid-buildlet go@1a58fd0f test.Test (log)
2023-11-20 18:34 linux-ppc64-sid-buildlet tools@bd215c0c go@e1dc209b github.com/yuin/goldmark/util [build] (log)
2023-11-20 18:59 linux-ppc64-sid-buildlet go@ddb38c3f cmd/go.TestScript (log)
|
All the newly attached failures seem to have a random bit set. Two of them are more obvious than others, but even the one that looks most like "oh maybe that's just garbage" does end up at the upper end of a valid index by flipping the top bit. Perhaps that's just confirmation bias, but it doesn't not fit the pattern. :P |
You are definitely not incorrect. |
If this is a generic ppc64 bug, I'd expect to see sporadic failures on the openbsd and aix builders. Both compile GOPPC64=power8 (but run on P9). |
@pmur Your comment makes me realize this has only appeared on linux-ppc64-buildlet. Perhaps a long-shot, but how hard would it be to determine if the failures were only happening on one particular machine going forward? IIRC there are only 5 such machines. What's the chance of faulty hardware? (This would be easy to check and rule out on LUCI, but we haven't set up the ppc64 machines there yet, only the ppc64le machines.) |
That is easy, fortunately. All 5 instances are running on a resource restricted container within the same VM. |
It's possible this problem could be related to the BE memcombine issue that was fixed here https://go.dev/cl/546355. |
It looks like there haven't been more of these failures since? I think it's reasonable this is related to the big endian memcombine issue. I'm gonna close this optimistically. Please comment if you disagree. |
I agree this was the likely culprit, since the time frame for when it started seemed consistent with the initial memcombine change, only happened on BE and has not happened since the memcombine fix went int. |
When I instrumented the memcombine fix, I only observed one code different in some darwin debug code. I did not see any in the failing tests. |
Found new dashboard test flakes for:
2024-02-29 21:45 x_tools-gotip-linux-ppc64-power10 tools@5bf7d005 go@b09ac10b x/tools/gopls/internal/cmd.TestImports (log)
2024-02-29 21:45 x_tools-gotip-linux-ppc64-power10 tools@5bf7d005 go@b09ac10b x/tools/gopls/internal/filecache.TestIPC (log)
2024-02-29 21:45 x_tools-gotip-linux-ppc64-power10 tools@5bf7d005 go@b09ac10b x/tools/gopls/internal/test/integration/workspace.TestOldGoNotification_UnsupportedVersion (log)
|
The recent two failures are not related. The luci container seems to collect zombie processes, I thought I had updated it to start the container with |
@pmur Am I understanding right that there's nothing else to do here then? Closing optimistically, thanks. |
New segfault found in the gopls tests.
2023-11-20 18:35 linux-ppc64-sid-buildlet tools@8966034e go@ddb38c3f x/tools/gopls/internal/regtest/marker (log)
— watchflakes
Originally posted by @gopherbot in #63736 (comment)
The text was updated successfully, but these errors were encountered: