Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: stack growth after fork on s390x #58785

Closed
Vishwanatha-HD opened this issue Feb 28, 2023 · 8 comments
Closed

runtime: fatal error: stack growth after fork on s390x #58785

Vishwanatha-HD opened this issue Feb 28, 2023 · 8 comments
Assignees
Labels
arch-s390x Issues solely affecting the s390x architecture. compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@Vishwanatha-HD
Copy link
Contributor

Vishwanatha-HD commented Feb 28, 2023

What version of Go are you using (go version)?

go version - "devel go1.21-f36dc54e9c" on linux s390x architecture.

Does this issue reproduce with the latest release?

Yes. The issue is reproducible easily.

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="s390x"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="s390x"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/vishwa/Workspaces/golang/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/vishwa/Workspaces/golang/go/pkg/tool/linux_s390x"
GOVCS=""
GOVERSION="devel go1.21-f36dc54e9c Thu Feb 23 06:44:23 2023 +0000"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/vishwa/Workspaces/golang/go/src/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -march=z196 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3776423934=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Race detection feature is not working properly on linux s390x. In order to evaluate that, I ran the "race.bash" script from "go/src" directory and I see that the script fails, and this is one of the errors that I am seeing.

What did you expect to see?

I expected the "race.bash" to run without any errors or issues.

What did you see instead?

The stack trace of the error is as below:

fatal error: stack growth after fork

runtime stack:
runtime.throw({0x12967aa, 0x17})
        /home/vishwa/Workspaces/golang/go/src/runtime/panic.go:1075 +0x58 fp=0x3ff6f3e8d28 sp=0x3ff6f3e8d00 pc=0x1054228
runtime.newstack()
        /home/vishwa/Workspaces/golang/go/src/runtime/stack.go:968 +0xdf8 fp=0x3ff6f3e8ec8 sp=0x3ff6f3e8d28 pc=0x1076bc8
runtime.morestack()
        /home/vishwa/Workspaces/golang/go/src/runtime/asm_s390x.s:342 +0x8e fp=0x3ff6f3e8ed0 sp=0x3ff6f3e8ec8 pc=0x1093f4e

goroutine 55 [running]:
runtime.checkptrAlignment(0xc0002262b0, 0x125c5e0, 0x1)
        /home/vishwa/Workspaces/golang/go/src/runtime/checkptr.go:9 +0xd4 fp=0xc000121240 sp=0xc000121240 pc=0x1018b54
syscall.forkAndExecInChild1(0xc00022a0d8, {0xc000206210, 0x3, 0x3}, {0xc00022e1c0, 0x38, 0x38}, 0x0, 0x0, 0xc000121820, ...)
        /home/vishwa/Workspaces/golang/go/src/syscall/exec_linux.go:489 +0xeba fp=0xc0001214e0 sp=0xc000121240 pc=0x10aa47a
syscall.forkAndExecInChild(0xc00022a0d8, {0xc000206210, 0x3, 0x3}, {0xc00022e1c0, 0x38, 0x38}, 0x0, 0x0, 0xc000121820, ...)
        /home/vishwa/Workspaces/golang/go/src/syscall/exec_linux.go:130 +0x84 fp=0xc0001215b0 sp=0xc0001214e0 pc=0x10a9424
syscall.forkExec({0xc00022a090, 0x15}, {0xc00020c120, 0x2, 0x2}, 0xc000121820)
        /home/vishwa/Workspaces/golang/go/src/syscall/exec_unix.go:209 +0x49a fp=0xc000121728 sp=0xc0001215b0 pc=0x10abfca
syscall.StartProcess(...)
        /home/vishwa/Workspaces/golang/go/src/syscall/exec_unix.go:255
os.startProcess({0xc00022a090, 0x15}, {0xc00020c120, 0x2, 0x2}, 0xc000121b30)
        /home/vishwa/Workspaces/golang/go/src/os/exec_posix.go:54 +0x5e8 fp=0xc000121868 sp=0xc000121728 pc=0x10e3398
os.StartProcess({0xc00022a090, 0x15}, {0xc00020c120, 0x2, 0x2}, 0xc000121b30)
        /home/vishwa/Workspaces/golang/go/src/os/exec.go:109 +0x7e fp=0xc0001218d0 sp=0xc000121868 pc=0x10e2cae
os/exec.(*Cmd).Start(0xc000214160)
        /home/vishwa/Workspaces/golang/go/src/os/exec/exec.go:693 +0xad4 fp=0xc000121b78 sp=0xc0001218d0 pc=0x1171de4
os/exec.(*Cmd).Run(0xc000214160)
        /home/vishwa/Workspaces/golang/go/src/os/exec/exec.go:587 +0x42 fp=0xc000121bb0 sp=0xc000121b78 pc=0x1171262
syscall_test.testAmbientCaps(0xc00023e4e0, 0x0)
        /home/vishwa/Workspaces/golang/go/src/syscall/exec_linux_test.go:633 +0x1462 fp=0xc000121e80 sp=0xc000121bb0 pc=0x11fd752
syscall_test.TestAmbientCaps(0xc00023e4e0)
        /home/vishwa/Workspaces/golang/go/src/syscall/exec_linux_test.go:526 +0x46 fp=0xc000121e98 sp=0xc000121e80 pc=0x11fc256
testing.tRunner(0xc00023e4e0, 0x12a0ee0)
        /home/vishwa/Workspaces/golang/go/src/testing/testing.go:1579 +0x244 fp=0xc000121fb0 sp=0xc000121e98 pc=0x115bd84
testing.(*T).Run.func1()
        /home/vishwa/Workspaces/golang/go/src/testing/testing.go:1632 +0x8e fp=0xc000121fd8 sp=0xc000121fb0 pc=0x115db8e
runtime.goexit()
        /home/vishwa/Workspaces/golang/go/src/runtime/asm_s390x.s:749 +0x2 fp=0xc000121fd8 sp=0xc000121fd8 pc=0x1096202
created by testing.(*T).Run in goroutine 1
        /home/vishwa/Workspaces/golang/go/src/testing/testing.go:1632 +0x894

goroutine 1 [chan receive]:
runtime.gopark(0x12a09e0, 0xc00023c1a8, 0xe, 0x17, 0x2)
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:381 +0x11e fp=0xc00011f620 sp=0xc00011f608 pc=0x105779e
runtime.chanrecv(0xc00023c150, 0xc00011f726, 0x1)
        /home/vishwa/Workspaces/golang/go/src/runtime/chan.go:583 +0x53c fp=0xc00011f6b0 sp=0xc00011f620 pc=0x101813c
runtime.chanrecv1(0xc00023c150, 0xc00011f726)
        /home/vishwa/Workspaces/golang/go/src/runtime/chan.go:442 +0x2e fp=0xc00011f6d8 sp=0xc00011f6b0 pc=0x1017b9e
testing.(*T).Run(0xc00010e9c0, {0x1293b20, 0xf}, 0x12a0ee0)
        /home/vishwa/Workspaces/golang/go/src/testing/testing.go:1633 +0x8ca fp=0xc00011f818 sp=0xc00011f6d8 pc=0x115d88a
testing.runTests.func1(0xc00010e9c0)
        /home/vishwa/Workspaces/golang/go/src/testing/testing.go:2039 +0xc4 fp=0xc00011f868 sp=0xc00011f818 pc=0x1161b74
testing.tRunner(0xc00010e9c0, 0xc00011fab8)
        /home/vishwa/Workspaces/golang/go/src/testing/testing.go:1579 +0x244 fp=0xc00011f980 sp=0xc00011f868 pc=0x115bd84
testing.runTests(0xc000134078, {0x13c1360, 0x2b, 0x2b}, {0xc0f78717521e3b86, 0x7dba8c4082, 0x13dfac0})
        /home/vishwa/Workspaces/golang/go/src/testing/testing.go:2037 +0x99a fp=0xc00011fad8 sp=0xc00011f980 pc=0x11619aa
testing.(*M).Run(0xc000138280)
        /home/vishwa/Workspaces/golang/go/src/testing/testing.go:1909 +0xca6 fp=0xc00011fe40 sp=0xc00011fad8 pc=0x115ed66
syscall_test.TestMain(0xc000138280)
        /home/vishwa/Workspaces/golang/go/src/syscall/syscall_linux_test.go:151 +0x10a fp=0xc00011fe68 sp=0xc00011fe40 pc=0x12048fa
main.main()
        _testmain.go:135 +0x376 fp=0xc00011ff80 sp=0xc00011fe68 pc=0x1211026
runtime.main()
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:250 +0x268 fp=0xc00011ffd8 sp=0xc00011ff80 pc=0x1057268
runtime.goexit()
        /home/vishwa/Workspaces/golang/go/src/runtime/asm_s390x.s:749 +0x2 fp=0xc00011ffd8 sp=0xc00011ffd8 pc=0x1096202

goroutine 2 [force gc (idle)]:
runtime.gopark(0x12a0d20, 0x13df5f0, 0x11, 0x14, 0x1)
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:381 +0x11e fp=0xc0000487b0 sp=0xc000048798 pc=0x105779e
runtime.goparkunlock(...)
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:387
runtime.forcegchelper()
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:305 +0xc6 fp=0xc0000487d8 sp=0xc0000487b0 pc=0x1057576
runtime.goexit()
        /home/vishwa/Workspaces/golang/go/src/runtime/asm_s390x.s:749 +0x2 fp=0xc0000487d8 sp=0xc0000487d8 pc=0x1096202
created by runtime.init.5 in goroutine 1
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:293 +0x30

goroutine 3 [GC sweep wait]:
runtime.gopark(0x12a0d20, 0x13df9a0, 0xc, 0x14, 0x1)
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:381 +0x11e fp=0xc000056f90 sp=0xc000056f78 pc=0x105779e
runtime.goparkunlock(...)
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:387
runtime.bgsweep(0xc000070000)
        /home/vishwa/Workspaces/golang/go/src/runtime/mgcsweep.go:278 +0x9c fp=0xc000056fc8 sp=0xc000056f90 pc=0x103daac
runtime.gcenable.func1()
        /home/vishwa/Workspaces/golang/go/src/runtime/mgc.go:178 +0x5e fp=0xc000056fd8 sp=0xc000056fc8 pc=0x1030a0e
runtime.goexit()
        /home/vishwa/Workspaces/golang/go/src/runtime/asm_s390x.s:749 +0x2 fp=0xc000056fd8 sp=0xc000056fd8 pc=0x1096202
created by runtime.gcenable in goroutine 1
        /home/vishwa/Workspaces/golang/go/src/runtime/mgc.go:178 +0xa8

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x12a0d20, 0x13dfb40, 0xd, 0x14, 0x2)
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:381 +0x11e fp=0xc000104f80 sp=0xc000104f68 pc=0x105779e
runtime.goparkunlock(...)
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:387
runtime.(*scavengerState).park(0x13dfb40)
        /home/vishwa/Workspaces/golang/go/src/runtime/mgcscavenge.go:399 +0x72 fp=0xc000104fa8 sp=0xc000104f80 pc=0x103b442
runtime.bgscavenge(0xc000070000)
        /home/vishwa/Workspaces/golang/go/src/runtime/mgcscavenge.go:627 +0x5a fp=0xc000104fc8 sp=0xc000104fa8 pc=0x103baba
runtime.gcenable.func2()
        /home/vishwa/Workspaces/golang/go/src/runtime/mgc.go:179 +0x5e fp=0xc000104fd8 sp=0xc000104fc8 pc=0x103098e
runtime.goexit()
        /home/vishwa/Workspaces/golang/go/src/runtime/asm_s390x.s:749 +0x2 fp=0xc000104fd8 sp=0xc000104fd8 pc=0x1096202
created by runtime.gcenable in goroutine 1
        /home/vishwa/Workspaces/golang/go/src/runtime/mgc.go:179 +0x10e

goroutine 18 [finalizer wait]:
runtime.gopark(0x12a0a30, 0x2352030, 0x10, 0x14, 0x1)
        /home/vishwa/Workspaces/golang/go/src/runtime/proc.go:381 +0x11e fp=0xc00005bf18 sp=0xc00005bf00 pc=0x105779e
runtime.runfinq()
        /home/vishwa/Workspaces/golang/go/src/runtime/mfinal.go:193 +0x15a fp=0xc00005bfd8 sp=0xc00005bf18 pc=0x102f63a
runtime.goexit()
        /home/vishwa/Workspaces/golang/go/src/runtime/asm_s390x.s:749 +0x2 fp=0xc00005bfd8 sp=0xc00005bfd8 pc=0x1096202
created by runtime.createfing in goroutine 1
        /home/vishwa/Workspaces/golang/go/src/runtime/mfinal.go:163 +0x64

@Vishwanatha-HD Vishwanatha-HD added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. arch-s390x Issues solely affecting the s390x architecture. compiler/runtime Issues related to the Go compiler and/or runtime. labels Feb 28, 2023
@bcmills
Copy link
Contributor

bcmills commented Feb 28, 2023

(attn @golang/s390x)

@Vishwanatha-HD Vishwanatha-HD self-assigned this Mar 1, 2023
@mknyszek mknyszek changed the title runtime stack: fatal error: stack growth after fork on s390x runtime: fatal error: stack growth after fork on s390x Mar 1, 2023
@mknyszek mknyszek added this to the Backlog milestone Mar 1, 2023
@mknyszek
Copy link
Contributor

mknyszek commented Mar 1, 2023

In triage, we think that this might be a race mode bug. There's a stack growth triggered on runtime.checkptrAlignment which is inserted by race mode instrumentation by the compiler. forkAndExecInChild1 has a //go:norace pragma, so maybe that should disable the checkptrAlignment instrumentation as well?

@Vishwanatha-HD
Copy link
Contributor Author

I made code changes to introduce //go:norace and //go:nochkptr pragmas for forkAndExecInChild1()... I could see that the checkptrAlignment instrumentation was disabled.. But I could see another issue/failure.. Any thoughts or suggestions on this..

BenchmarkAESCFBEncrypt1K-4 fatal error: checkptr: converted pointer straddles multiple allocations

goroutine 67 [running]:
runtime.throw({0x2471ac, 0x3a})
/home/vishwa/Workspaces/golang/go/src/runtime/panic.go:1075 +0x58 fp=0xc000057b30 sp=0xc000057b08 pc=0x88df8
runtime.checkptrAlignment(0xc00013612b, 0x20f640, 0x1)
/home/vishwa/Workspaces/golang/go/src/runtime/checkptr.go:26 +0xb0 fp=0xc000057b50 sp=0xc000057b30 pc=0x4db30
crypto/subtle.words(...)
/home/vishwa/Workspaces/golang/go/src/crypto/subtle/xor_generic.go:49
crypto/subtle.xorBytes(0xc00014c400, 0xc00014c400, 0xc00013612b, 0x5)
/home/vishwa/Workspaces/golang/go/src/crypto/subtle/xor_generic.go:29 +0x144 fp=0xc000057be0 sp=0xc000057b50 pc=0x169eb4
crypto/subtle.XORBytes({0xc00014c400, 0x3fb, 0x3fb}, {0xc00014c400, 0x3fb, 0x3fb}, {0xc00013612b, 0x5, 0x5})
/home/vishwa/Workspaces/golang/go/src/crypto/subtle/xor.go:22 +0x9e fp=0xc000057c10 sp=0xc000057be0 pc=0x169cde
crypto/cipher.(*cfb).XORKeyStream(0xc00009e050, {0xc00014c400, 0x3fb, 0x3fb}, {0xc00014c400, 0x3fb, 0x3fb})
/home/vishwa/Workspaces/golang/go/src/crypto/cipher/cfb.go:43 +0x48a fp=0xc000057ce8 sp=0xc000057c10 pc=0x16d02a
crypto/cipher_test.benchmarkAESStream(0xc00012e780, 0x24ad08, {0xc00014c400, 0x3fb, 0x3fb})
/home/vishwa/Workspaces/golang/go/src/crypto/cipher/benchmark_test.go:78 +0x16e fp=0xc000057d58 sp=0xc000057ce8 pc=0x1e620e
crypto/cipher_test.BenchmarkAESCFBEncrypt1K(0xc00012e780)
/home/vishwa/Workspaces/golang/go/src/crypto/cipher/benchmark_test.go:90 +0x78 fp=0xc000057d88 sp=0xc000057d58 pc=0x1e62e8
testing.(*B).runN(0xc00012e780, 0x64)
/home/vishwa/Workspaces/golang/go/src/testing/benchmark.go:193 +0x1f4 fp=0xc000057ec8 sp=0xc000057d88 pc=0x149104
testing.(*B).launch(0xc00012e780)
/home/vishwa/Workspaces/golang/go/src/testing/benchmark.go:334 +0x2ee fp=0xc000057fc0 sp=0xc000057ec8 pc=0x14a72e
testing.(*B).doBench.func1()
/home/vishwa/Workspaces/golang/go/src/testing/benchmark.go:284 +0x7c fp=0xc000057fd8 sp=0xc000057fc0 pc=0x14a41c
runtime.goexit()
/home/vishwa/Workspaces/golang/go/src/runtime/asm_s390x.s:749 +0x2 fp=0xc000057fd8 sp=0xc000057fd8 pc=0xc7422
created by testing.(*B).doBench in goroutine 1
/home/vishwa/Workspaces/golang/go/src/testing/benchmark.go:284 +0xce

@cherrymui
Copy link
Member

The new error seems not S390X-specific. I can reproduce on AMD64 with go test -tags=purego -race -bench=BenchmarkAESCFBEncrypt1K crypto/cipher. Maybe there is some issue for xor_generic.go, or the benchmark, or the compiler's checkptr implementation. Could you file a separate issue for that?

Also, do you mind sending a CL for the forkAndExecInChild1 fix? Thanks.

@Vishwanatha-HD
Copy link
Contributor Author

Vishwanatha-HD commented Mar 30, 2023

@cherrymui.. Thanks for your response on this and reproducing this issue on AMD64 and confirming that it not s390x specific one..
I debugged the issue more and found out the place of the bug.. It's in one of the benchmark functions "benchmarkAESStream()" in "src/crypto/cipher/benchmark_test.go" file...

The issue is that "benchmarkAESStream()" gets called from various other functions such as:
func BenchmarkAESCFBEncrypt1K()
func BenchmarkAESCFBDecrypt1K()
func BenchmarkAESCFBDecrypt8K()
func BenchmarkAESOFB1K()
func BenchmarkAESCTR1K()
func BenchmarkAESCTR8K()

So, the "b.N" value inside the "benchmarkAESStream()" function is set using the "buf" that gets passed to this function.

When I printed the b.N value inside "benchmarkAESStream()" function it printed these values below:
benchmarkAESStream()---- b.N = 100
benchmarkAESStream()---- b.N = 10000
benchmarkAESStream()---- b.N = 1000000
benchmarkAESStream()---- b.N = 100000000
benchmarkAESStream()---- b.N = 1000000000

And, the below "for" loop is run with "b.N" times everytime the control enters this function and calls "XORKeyStream()" inside it..
for i := 0; i < b.N; i++ {
stream.XORKeyStream(buf, buf) --->>> Place where the error occurs..
}

Loop running over ~1-billion times in the last case, causing the condition of a pointer straddling multiple heap objects, that too when we run this with "-race" option..

Fix that I have come up with to take care of this issue is that to create a copy of a buffer with "buf" length and send that buffer to the XORKeyStream() function instead of calling it inside the for loop..

Code snippet:
bufCopy := make([]byte, len(buf))
copy(bufCopy, buf)
stream.XORKeyStream(buf, bufCopy)

When I made the above change and ran the "./race.bash" script from "src", all the tests passed and the race functionality worked correctly..

Please let me know your thoughts or comments on this...

@randall77
Copy link
Contributor

I know what the problem in xor_generic.go is.
We do, in words:

return unsafe.Slice((*uintptr)(unsafe.Pointer(&x[0])), uintptr(len(x))/wordSize)

If len(x)<wordSize, then the cast itself is bad. We end up making a slice of 0 length in that case, so there's nothing inherently wrong there, but the cast itself doesn't know that. Checkptr on (*uintptr)(unsafe.Pointer(&x[0])) requires x to be of length at least wordSize.

The fix is pretty simple, just bail early len(x)<wordSize.

@Vishwanatha-HD
Copy link
Contributor Author

Fix for the issue "BenchmarkAESCFBEncrypt1K-4 fatal error: checkptr: converted pointer straddles multiple allocations" seen both on s390x and on AMD64 has been given by Keith Randall.. Below is the link for the CL..
https://go-review.googlesource.com/c/go/+/480575

@randall77, I verified your fix on s390x and its working fine.. Thanks for accepting the issue and fixing it...

@Vishwanatha-HD Vishwanatha-HD added Testing An issue that has been verified to require only test changes, not just a test failure. NeedsFix The path to resolution is known, but the work has not been done. labels Apr 1, 2023
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 1, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/481415 mentions this issue: syscall: add a Go directive "go:nocheckptr" to forkAndExecInChild1

@github-project-automation github-project-automation bot moved this from Todo to Done in Go Compiler / Runtime Apr 11, 2023
@golang golang locked and limited conversation to collaborators Apr 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-s390x Issues solely affecting the s390x architecture. compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

6 participants