Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: mysterious bit set of a high order bit causes segfaults and bounds check failures #64290

Closed
findleyr opened this issue Nov 20, 2023 · 23 comments
Labels
arch-ppc64x compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@findleyr
Copy link
Member

findleyr commented Nov 20, 2023

#!watchflakes
post <- goarch == "ppc64" && ((log ~ `SIGSEGV` && log ~ `0x0x8000`) || (log ~ `runtime error: index out of range`))

New segfault found in the gopls tests.

2023-11-20 18:35 linux-ppc64-sid-buildlet tools@8966034e go@ddb38c3f x/tools/gopls/internal/regtest/marker (log)
SIGSEGV: segmentation violation
PC=0x28e6c m=9 sigcode=3 addr=0x800000000b53940

goroutine 0 gp=0xc00070fa00 m=9 mp=0xc000744008 [idle]:
runtime.(*mspan).typePointersOfUnchecked(0xc00b8b0000?, 0xc00b959d70?)
	/workdir/go/src/runtime/mbitmap_allocheaders.go:202 +0x4c fp=0xc000fade38 sp=0xc000faddf8 pc=0x28e6c
runtime.scanobject(0xc00b8b0000, 0xc00004fc48)
	/workdir/go/src/runtime/mgcmark.go:1446 +0xcc fp=0xc000faded0 sp=0xc000fade38 pc=0x360fc
runtime.gcDrainN(0xc00004fc48, 0x10000)
	/workdir/go/src/runtime/mgcmark.go:1331 +0x1f0 fp=0xc000fadf10 sp=0xc000faded0 pc=0x35e30
...
r18  0xc00004ea08	r19  0xc000050a30
r20  0xa8	r21  0xc000744008
r22  0xc000faa3a0	r23  0xc000fadeb8
r24  0x5a	r25  0xf0
r26  0xc0129eaa1b	r27  0xc0129eaa17
r28  0x0	r29  0xc000050a28
r30  0xc00070fa00	r31  0x360fc
pc   0x28e6c	ctr  0x7fff9bdf04d0
link 0x360fc	xer  0x20000000
ccr  0x44428084	trap 0x380

watchflakes

Originally posted by @gopherbot in #63736 (comment)

@findleyr
Copy link
Member Author

CC @mknyszek

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 20, 2023
@mknyszek
Copy link
Contributor

Interesting, it's segfaulting on the type pointer. That pointer also doesn't look like it's nil, either, which makes this very odd.

So far it looks like this has only failed on linux-ppc64? Is that right?

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 20, 2023
@mknyszek mknyszek added this to the Go1.22 milestone Nov 20, 2023
@mknyszek mknyszek self-assigned this Nov 20, 2023
@findleyr
Copy link
Member Author

@mknyszek that's the only failure I've observed, but it might be worth checking with greplogs.

@mknyszek
Copy link
Contributor

Hey @golang/ppc64, is there something special about where the type metadata (from the binary) is placed in the address space? i.e. does 0x800000000b53940 look like it could plausibly be a type pointer, or does it look completely bogus?

@mknyszek
Copy link
Contributor

@findleyr Thanks. greplogs shows nothing else as well.

@pmur
Copy link
Contributor

pmur commented Nov 20, 2023

I'd err on the side of bogus. I think mapped memory usually falls below 0x8000000000000000 on PPC64 in most cases. ppc64/linux doesn't support PIC, so all static data/text should occupy fairly low addresses starting at 0x10000.

@mknyszek
Copy link
Contributor

There was another failure, this time unrelated to typePointersOfUnchecked, with a similar segfault on a similar address. This makes me think it's more general memory corruption.

https://build.golang.org/log/cbb742b677c36b578951d0a9b5589c1dc223a1c1

@mknyszek
Copy link
Contributor

@mknyszek mknyszek changed the title runtime: segfault in typePointersOfUnchecked runtime: segfault at very high address on ppc64 Nov 21, 2023
@mknyszek
Copy link
Contributor

The high bit that seems to be set is the 60th bit? Why does that sound familiar...

@mknyszek
Copy link
Contributor

It looks like #64294 is related. There's a weird integer found in a slice that also seems to have some high bit set. (134217730 is 0x8000002 in hex.)

After letting this soak for a bit, I'm starting to become convinced this is a ppc64-specific issue with high bits being set. Maybe a subtle miscompilation somewhere?

@mknyszek mknyszek removed their assignment Nov 29, 2023
@mknyszek mknyszek changed the title runtime: segfault at very high address on ppc64 runtime: mysterious bit set of a high order bit causes segfaults and bounds check failures Nov 29, 2023
@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- goarch == "ppc64" && ((log ~ `SIGSEGV` && log ~ `0x0x8000`) || (log ~ `runtime error: index out of range`))
2023-11-01 07:10 linux-ppc64-sid-buildlet go@1a58fd0f test.Test (log)
--- FAIL: Test (0.11s)
    --- FAIL: Test/rangegen.go (154.13s)
        testdir_test.go:142: exit status 1
            # command-line-arguments
            panic: runtime error: index out of range [755141] with length 243821

            goroutine 1 [running]:
            panic({0x2b8220?, 0xc000014360?})
            	runtime/panic.go:772 +0x174 fp=0xc0000ac4e0 sp=0xc0000ac420 pc=0x4a534
            runtime.goPanicIndex(0xb85c5, 0x3b86d)
            	runtime/panic.go:114 +0x98 fp=0xc0000ac530 sp=0xc0000ac4e0 pc=0x492d8
            cmd/link/internal/loader.(*Loader).resolve(0xc000096e00?, 0xc00009c200?, {0xc0?, 0xf2000?})
            	cmd/link/internal/loader/loader.go:648 +0x27c fp=0xc0000ac570 sp=0xc0000ac530 pc=0x13392c
            cmd/link/internal/loader.Reloc.Sym({0x7fff5c58ffeb?, 0xc00009c200?, 0xc000096e00?})
            	cmd/link/internal/loader/loader.go:61 +0x64 fp=0xc0000ac5a8 sp=0xc0000ac570 pc=0x131754
            cmd/link/internal/ld.(*deadcodePass).flood(0xc0000ac8e8)
            	cmd/link/internal/ld/deadcode.go:247 +0xb0c fp=0xc0000ac810 sp=0xc0000ac5a8 pc=0x19d7cc
            cmd/link/internal/ld.deadcode(0xc00009a200)
            	cmd/link/internal/ld/deadcode.go:433 +0x80 fp=0xc0000ac990 sp=0xc0000ac810 pc=0x19ea40
            cmd/link/internal/ld.Main(_, {0x10, 0x20, 0x1, 0x1, 0x41, 0x1c00000, {0x0, 0x0, 0x0}, ...})
            	cmd/link/internal/ld/main.go:353 +0x13ec fp=0xc0000acc40 sp=0xc0000ac990 pc=0x1e34bc
            main.main()
            	cmd/link/main.go:72 +0x106c fp=0xc0000adf38 sp=0xc0000acc40 pc=0x26152c
            runtime.main()
            	runtime/proc.go:269 +0x2f0 fp=0xc0000adfc0 sp=0xc0000adf38 pc=0x4dcc0
            runtime.goexit({})
            	runtime/asm_ppc64x.s:993 +0x4 fp=0xc0000adfc0 sp=0xc0000adfc0 pc=0x81244
2023-11-20 18:34 linux-ppc64-sid-buildlet tools@bd215c0c go@e1dc209b github.com/yuin/goldmark/util [build] (log)
../../../../pkg/mod/github.com/yuin/[email protected]/util/html5entities.go:18:53: internal compiler error: 'init': panic during regalloc while compiling init:

runtime error: index out of range [134432465] with length 262144

goroutine 37 [running]:
cmd/compile/internal/ssa.Compile.func1()
	/workdir/go/src/cmd/compile/internal/ssa/compile.go:49 +0x7c
panic({0x939de0, 0xc003e42000})
	/workdir/go/src/runtime/panic.go:884 +0x240
cmd/compile/internal/ssa.(*sparseMapPos).set(...)
...
	/workdir/go/src/cmd/compile/internal/gc/compile.go:171 +0x4c
cmd/compile/internal/gc.compileFunctions.func3.1()
	/workdir/go/src/cmd/compile/internal/gc/compile.go:153 +0x44
created by cmd/compile/internal/gc.compileFunctions.func3
	/workdir/go/src/cmd/compile/internal/gc/compile.go:152 +0x1f4



Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
2023-11-20 18:59 linux-ppc64-sid-buildlet go@ddb38c3f cmd/go.TestScript (log)
vcs-test.golang.org rerouted to http://127.0.0.1:42483
https://vcs-test.golang.org rerouted to https://127.0.0.1:42695
go test proxy running at GOPROXY=http://127.0.0.1:38937/mod
--- FAIL: TestScript (0.07s)
    --- FAIL: TestScript/test_chatty_parallel_fail (1.10s)
        script_test.go:132: 2023-11-20T19:16:43Z
        script_test.go:134: $WORK=/workdir/tmp/cmd-go-test-4014964717/tmpdir3221805523/test_chatty_parallel_fail2229050004
        script_test.go:156: 
            # Run parallel chatty tests.
            # Check that multiple parallel outputs continue running. (0.658s)
...
            panic: runtime error: index out of range [4195651] with length 4175

            goroutine 1 gp=0xc0000041a0 m=0 mp=0x4cba80 [running]:
            panic({0x2b87e0?, 0xc0007101c8?})
            	runtime/panic.go:779 +0x174 fp=0xc00007a490 sp=0xc00007a3d0 pc=0x4af14
            runtime.goPanicIndexU(0x400543, 0x104f)
            	runtime/panic.go:120 +0x94 fp=0xc00007a4e0 sp=0xc00007a490 pc=0x49d54
            cmd/link/internal/loader.Bitmap.Has(...)
            	cmd/link/internal/loader/loader.go:131
            cmd/link/internal/loader.(*Loader).AttrReachable(...)
...
            	/workdir/go/src/runtime/proc.go:402 +0x114 fp=0xc00005e708 sp=0xc00005e6d8 pc=0x53644
            runtime.gcBgMarkWorker()
            	/workdir/go/src/runtime/mgc.go:1310 +0xf8 fp=0xc00005e7c0 sp=0xc00005e708 pc=0x31c28
            runtime.goexit({})
            	/workdir/go/src/runtime/asm_ppc64x.s:1018 +0x4 fp=0xc00005e7c0 sp=0xc00005e7c0 pc=0x8b1b4
            created by runtime.gcBgMarkStartWorkers in goroutine 6
            	/workdir/go/src/runtime/mgc.go:1234 +0x30
            [exit status 2]
            > stderr 'ios/arm64 requires external \(cgo\) linking, but cgo is not enabled'
        script_test.go:156: FAIL: testdata/script/test_android_issue62123.txt:14: stderr 'ios/arm64 requires external \(cgo\) linking, but cgo is not enabled': no match for `(?m)ios/arm64 requires external \(cgo\) linking, but cgo is not enabled` in stderr

watchflakes

@mknyszek
Copy link
Contributor

All the newly attached failures seem to have a random bit set. Two of them are more obvious than others, but even the one that looks most like "oh maybe that's just garbage" does end up at the upper end of a valid index by flipping the top bit. Perhaps that's just confirmation bias, but it doesn't not fit the pattern. :P

@randall77
Copy link
Contributor

You are definitely not incorrect.

@pmur
Copy link
Contributor

pmur commented Nov 30, 2023

If this is a generic ppc64 bug, I'd expect to see sporadic failures on the openbsd and aix builders. Both compile GOPPC64=power8 (but run on P9).

@mknyszek
Copy link
Contributor

mknyszek commented Nov 30, 2023

@pmur Your comment makes me realize this has only appeared on linux-ppc64-buildlet. Perhaps a long-shot, but how hard would it be to determine if the failures were only happening on one particular machine going forward? IIRC there are only 5 such machines. What's the chance of faulty hardware?

(This would be easy to check and rule out on LUCI, but we haven't set up the ppc64 machines there yet, only the ppc64le machines.)

@pmur
Copy link
Contributor

pmur commented Nov 30, 2023

That is easy, fortunately. All 5 instances are running on a resource restricted container within the same VM.

@laboger
Copy link
Contributor

laboger commented Dec 4, 2023

It's possible this problem could be related to the BE memcombine issue that was fixed here https://go.dev/cl/546355.

@mknyszek
Copy link
Contributor

It looks like there haven't been more of these failures since? I think it's reasonable this is related to the big endian memcombine issue. I'm gonna close this optimistically. Please comment if you disagree.

@laboger
Copy link
Contributor

laboger commented Dec 13, 2023

I agree this was the likely culprit, since the time frame for when it started seemed consistent with the initial memcombine change, only happened on BE and has not happened since the memcombine fix went int.

@pmur
Copy link
Contributor

pmur commented Dec 13, 2023

When I instrumented the memcombine fix, I only observed one code different in some darwin debug code. I did not see any in the failing tests.

@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- goarch == "ppc64" && ((log ~ `SIGSEGV` && log ~ `0x0x8000`) || (log ~ `runtime error: index out of range`))
2024-02-29 21:45 x_tools-gotip-linux-ppc64-power10 tools@5bf7d005 go@b09ac10b x/tools/gopls/internal/cmd.TestImports (log)
=== RUN   TestImports
=== PAUSE TestImports
=== CONT  TestImports
    integration_test.go:531: gopls imports a.go: exited with code 2, want success: true (gopls imports a.go: exit=2 stdout=<<>> stderr=<<Log: Loading packages...
        Info: Error loading packages: err: fork/exec /home/swarming/.swarming/w/ir/x/w/goroot/bin/go: resource temporarily unavailable: stderr: 
        Error: Error loading workspace folders (expected 1, got 0)
        failed to load view for file://.: err: fork/exec /home/swarming/.swarming/w/ir/x/w/goroot/bin/go: resource temporarily unavailable: stderr: 
        
        Log: Loading packages...
        Info: Error loading packages: err: fork/exec /home/swarming/.swarming/w/ir/x/w/goroot/bin/go: resource temporarily unavailable: stderr: 
...
        	fmt.Println()
        }
        >>, want <<package a
        
        import "fmt"
        func _() {
        	fmt.Println()
        }
        >>
--- FAIL: TestImports (2.29s)
2024-02-29 21:45 x_tools-gotip-linux-ppc64-power10 tools@5bf7d005 go@b09ac10b x/tools/gopls/internal/filecache.TestIPC (log)
=== RUN   TestIPC
    filecache_test.go:157: fork/exec /home/swarming/.swarming/w/ir/x/t/go-build3585518083/b527/filecache.test: resource temporarily unavailable
--- FAIL: TestIPC (0.00s)
2024-02-29 21:45 x_tools-gotip-linux-ppc64-power10 tools@5bf7d005 go@b09ac10b x/tools/gopls/internal/test/integration/workspace.TestOldGoNotification_UnsupportedVersion (log)
=== RUN   TestOldGoNotification_UnsupportedVersion
    workspace_test.go:1285: err: fork/exec /home/swarming/.swarming/w/ir/x/w/goroot/bin/go: resource temporarily unavailable: stderr: 
--- FAIL: TestOldGoNotification_UnsupportedVersion (0.00s)

watchflakes

@gopherbot gopherbot reopened this Mar 4, 2024
@pmur
Copy link
Contributor

pmur commented Mar 5, 2024

The recent two failures are not related. The luci container seems to collect zombie processes, I thought I had updated it to start the container with --init, but I must have missed the ppc64 builder.

@mknyszek
Copy link
Contributor

mknyszek commented Mar 6, 2024

@pmur Am I understanding right that there's nothing else to do here then? Closing optimistically, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-ppc64x compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Archived in project
Development

No branches or pull requests

6 participants