Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: signal arriving during needm/dropm invokes Go signal handler in bad state #12277

Closed
anacrolix opened this issue Aug 23, 2015 · 13 comments

Comments

@anacrolix
Copy link
Contributor

While running go-fuzz, I sent it SIGINT, and this happened:

runtime: newstack called from g=859541700608
    m=0xc8201e2000 m->curg=0xc8202f2f00 m->g0=0xc820ad8180 m->gsignal=0xc820ad8000
runtime.sigtrampgo(0x63e70, 0x20000001e, 0xc82018ef88, 0xc82018eff0)
    /Users/anacrolix/src/go.master/src/runtime/signal_darwin.go:68 +0xc1
runtime.sigtramp(0xf, 0x84, 0x40000000e, 0xc82026a360, 0x0, 0xc82026a360, 0x6, 0x32, 0xc82026a360, 0x0, ...)
    /Users/anacrolix/src/go.master/src/runtime/sys_darwin_amd64.s:237 +0x23
fatal error: runtime: wrong goroutine in newstack

runtime stack:
runtime.throw(0x511be0, 0x24)
    /Users/anacrolix/src/go.master/src/runtime/panic.go:527 +0x90
runtime.newstack()
    /Users/anacrolix/src/go.master/src/runtime/stack1.go:675 +0x266
runtime.morestack()
    /Users/anacrolix/src/go.master/src/runtime/asm_amd64.s:330 +0x7f

$ go version
go version go1.5 darwin/amd64

@ianlancetaylor ianlancetaylor added this to the Go1.5.1 milestone Aug 25, 2015
@ianlancetaylor
Copy link
Member

CC @randall77

@minux
Copy link
Member

minux commented Aug 25, 2015 via email

@randall77
Copy link
Contributor

Maybe, but then would all code called from sigtramp need to be marked nosplit? Just hunting for an example, tracebacktrap is not marked nosplit as well.

@rsc
Copy link
Contributor

rsc commented Aug 25, 2015

sighandler runs on the signal stack with g = the signal goroutine. If the handler is running off the signal stack, there is a bug. We should not mark things nosplit; that will just hide the bug.

@rsc
Copy link
Contributor

rsc commented Aug 25, 2015

Can you reproduce this?
Was this in go-fuzz itself or in a binary go-fuzz built?
Can you share steps to reproduce? Or the binary that broke?

If this is in a go-fuzz-built binary, not go-fuzz itself, I wonder if go-fuzz is trying to instrument package runtime and broke the signal handlers somehow.

The decimal g= below really is 0xc820ad8000 aka m->gsignal, so that part is right. But obviously the stack bounds are not. That means that somehow the code is either on the wrong stack or overflowed the signal stack. The latter could happen if the signal handler somehow caused a new signal to arrive and be handled. I thought we had signals masked off such that that isn't allowed and would just be a plain OS-level process crash, but maybe some signals can leak through. If it happened recursively repeatedly, you'd blow out the signal stack.

@dvyukov

@dvyukov
Copy link
Member

dvyukov commented Aug 25, 2015

go-fuzz does not instument the following list of packages:

    ignore := map[string]bool{
        "runtime":       true, // lots of non-determinism and irrelevant code paths (e.g. different paths in mallocgc, chans and maps)
        "unsafe":        true, // nothing to see here (also creates import cycle with go-fuzz-dep)
        "errors":        true, // nothing to see here (also creates import cycle with go-fuzz-dep)
        "syscall":       true, // creates import cycle with go-fuzz-dep (and probably nothing to see here)
        "sync":          true, // non-deterministic and not interesting (also creates import cycle with go-fuzz-dep)
        "sync/atomic":   true, // not interesting (also creates import cycle with go-fuzz-dep)
        "time":          true, // creates import cycle with go-fuzz-dep
        "runtime/cgo":   true, // why would we instrument it?
        "runtime/pprof": true, // why would we instrument it?
        "runtime/race":  true, // why would we instrument it?
    }
    if runtime.GOOS == "windows" {
        // Cross-compilation is not implemented.
        ignore["unicode/utf16"] = true                     // syscall depends on unicode/utf16
        ignore["internal/syscall/windows/registry"] = true // time depends on this
        ignore["io"] = true                                // internal/syscall/windows/registry depends on this
    }

@anacrolix
Copy link
Contributor Author

I think go-fuzz was slow to terminate, and I sent several SIGINTs (ctrl+C) in quick succession. I think I was go-fuzzing github.com/anacrolix/torrent/bencode on OSX, though there's nothing unusual about that package. I don't currently have time to try and reproduce, I can revisit this in a week if the issue hasn't progressed.

@jnjackins
Copy link
Contributor

I am getting this bug, without using go-fuzz.

go version devel +805e56e Sat Aug 29 02:43:57 2015 +0000 darwin/amd64

Darwin Kernel Version 14.4.0 (rsc's pprof patch applied)

To reproduce:

  1. pull latest golang.org/x/exp (commit 574adff1580fca7a18de551f4102549e9e2ba3f4)
  2. make the following changes:
$ git diff
diff --git a/shiny/example/goban/main.go b/shiny/example/goban/main.go
index d4a1df8..bac3ec1 100644
--- a/shiny/example/goban/main.go
+++ b/shiny/example/goban/main.go
@@ -20,6 +20,8 @@ import (
        stdDraw "image/draw"
        "log"
        "math/rand"
+       "os"
+       "runtime/pprof"
        "time"

        "golang.org/x/exp/shiny/driver"
@@ -41,6 +43,16 @@ func main() {
        rand.Seed(int64(time.Now().Nanosecond()))
        board := NewBoard(9, *scale)

+       f, _ := os.Create("prof.out")
+       if err := pprof.StartCPUProfile(f); err == nil {
+               go func() {
+                       time.Sleep(1 * time.Minute)
+                       pprof.StopCPUProfile()
+                       os.Exit(0)
+               }()
+       } else {
+               log.Fatal(err)
+       }
        driver.Main(func(s screen.Screen) {
                w, err := s.NewWindow(nil)
                if err != nil {
  1. run:
$ go run board.go main.go xy.go 
2015/08/29 13:03:41 sendLifecycle: StageAlive
2015/08/29 13:03:41 sendLifecycle: StageVisible
2015/08/29 13:03:41 sendLifecycle: StageFocused
2015/08/29 13:03:51 sendLifecycle: StageVisible
runtime: newstack called from g=0xc820000300
    m=0xc82002e000 m->curg=0xc820000600 m->g0=0xc820000480 m->gsignal=0xc820000300
runtime.sigtrampgo(0x4044ac0, 0x1b0000001e, 0xb06b7970, 0xb06b79d8)
    /Users/jnj/src/go/src/runtime/signal_darwin.go:68 +0xc1
runtime.sigtramp(0x0, 0x0, 0x0, 0x200003500000085, 0x114c32000, 0x0, 0xb06b7aa8, 0xb06b7a90, 0x0, 0xb06b7aa8, ...)
    /Users/jnj/src/go/src/runtime/sys_darwin_amd64.s:237 +0x23
fatal error: runtime: wrong goroutine in newstack

runtime stack:
runtime.throw(0x420cfe0, 0x24)
    /Users/jnj/src/go/src/runtime/panic.go:527 +0x90
runtime.newstack()
    /Users/jnj/src/go/src/runtime/stack1.go:676 +0x260
runtime.morestack()
    /Users/jnj/src/go/src/runtime/asm_amd64.s:330 +0x7f

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /Users/jnj/src/go/src/runtime/asm_amd64.s:1739 +0x1 fp=0xc82003cfb8 sp=0xc82003cfb0

goroutine 1 [syscall, locked to thread]:
golang.org/x/exp/shiny/driver/gldriver._Cfunc_startDriver()
    golang.org/x/exp/shiny/driver/gldriver/_obj/_cgo_gotypes.go:140 +0x31
golang.org/x/exp/shiny/driver/gldriver.main(0xc821edaf50, 0x0, 0x0)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/cocoa.go:72 +0x193
golang.org/x/exp/shiny/driver/gldriver.Main(0xc821edaf50)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/gldriver.go:26 +0x30
golang.org/x/exp/shiny/driver.main(0xc821edaf50)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/driver_darwin.go:15 +0x21
golang.org/x/exp/shiny/driver.Main(0xc821edaf50)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/driver.go:24 +0x21
main.main()
    /Users/jnj/src/golang.org/x/exp/shiny/example/goban/main.go:122 +0x14e

goroutine 34 [syscall]:
runtime.CPUProfile(0x0, 0x0, 0x0)
    /Users/jnj/src/go/src/runtime/cpuprof.go:418 +0x34
runtime/pprof.profileWriter(0x47a4210, 0xc820090070)
    /Users/jnj/src/go/src/runtime/pprof/pprof.go:599 +0x1c
created by runtime/pprof.StartCPUProfile
    /Users/jnj/src/go/src/runtime/pprof/pprof.go:593 +0x145

goroutine 35 [sleep]:
time.Sleep(0xdf8475800)
    /Users/jnj/src/go/src/runtime/time.go:59 +0xf9
main.main.func1()
    /Users/jnj/src/golang.org/x/exp/shiny/example/goban/main.go:49 +0x26
created by main.main
    /Users/jnj/src/golang.org/x/exp/shiny/example/goban/main.go:52 +0x10f

goroutine 36 [chan receive]:
main.main.func2(0x59108d0, 0x42c9360)
    /Users/jnj/src/golang.org/x/exp/shiny/example/goban/main.go:74 +0x35e
golang.org/x/exp/shiny/driver/gldriver.driverStarted.func1()
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/cocoa.go:79 +0x47
created by golang.org/x/exp/shiny/driver/gldriver.driverStarted
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/cocoa.go:81 +0x2b

goroutine 4 [select]:
golang.org/x/exp/shiny/driver/internal/pump.(*Pump).run(0xc82000e1a0)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/internal/pump/pump.go:62 +0x4df
created by golang.org/x/exp/shiny/driver/internal/pump.Make
    /Users/jnj/src/golang.org/x/exp/shiny/driver/internal/pump/pump.go:15 +0x116

goroutine 5 [select, locked to thread]:
golang.org/x/exp/shiny/driver/gldriver.drawLoop(0xc82006c0e0, 0x4402a60)
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/cocoa.go:124 +0x2a6
created by golang.org/x/exp/shiny/driver/gldriver.(*screenImpl).NewWindow
    /Users/jnj/src/golang.org/x/exp/shiny/driver/gldriver/screen.go:103 +0x28e

goroutine 50 [syscall, locked to thread]:
runtime.goexit()
    /Users/jnj/src/go/src/runtime/asm_amd64.s:1739 +0x1

goroutine 51 [syscall, locked to thread]:
runtime.goexit()
    /Users/jnj/src/go/src/runtime/asm_amd64.s:1739 +0x1
exit status 2

It seems to take a random amount of time for the bug to manifest (up to 10 seconds).

@rsc
Copy link
Contributor

rsc commented Nov 13, 2015

@jnjackins, thanks for the goban example!

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/16915 mentions this issue.

@rsc rsc changed the title runtime: fatal error: wrong goroutine in newstack runtime: signal arriving during needm/dropm invokes Go signal handler in bad state Nov 17, 2015
@rsc rsc closed this as completed in f8e6418 Nov 18, 2015
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/17072 mentions this issue.

rsc added a commit that referenced this issue Nov 20, 2015
Solaris needs to make system calls without a g,
and Solaris uses asmcgocall to make system calls.
I know, I know.

I hope this makes CL 16915, fixing #12277, work on Solaris.

Change-Id: If988dfd37f418b302da9c7096f598e5113ecea87
Reviewed-on: https://go-review.googlesource.com/17072
Reviewed-by: Ian Lance Taylor <[email protected]>
Reviewed-by: Aram Hăvărneanu <[email protected]>
Run-TryBot: Russ Cox <[email protected]>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/17141 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/17129 mentions this issue.

rsc added a commit that referenced this issue Nov 23, 2015
Solaris needs to make system calls without a g,
and Solaris uses asmcgocall to make system calls.
I know, I know.

I hope this makes CL 16915, fixing #12277, work on Solaris.

Change-Id: If988dfd37f418b302da9c7096f598e5113ecea87
Reviewed-on: https://go-review.googlesource.com/17072
Reviewed-by: Ian Lance Taylor <[email protected]>
Reviewed-by: Aram Hăvărneanu <[email protected]>
Run-TryBot: Russ Cox <[email protected]>
Reviewed-on: https://go-review.googlesource.com/17129
rsc added a commit that referenced this issue Nov 23, 2015
…reated threads and async signals

Cgo-created threads transition between having associated Go g's and m's and not.
A signal arriving during the transition could think it was safe and appropriate to
run Go signal handlers when it was in fact not.
Avoid the race by masking all signals during the transition.

Fixes #12277.

Change-Id: Ie9711bc1d098391d58362492197a7e0f5b497d14
Reviewed-on: https://go-review.googlesource.com/16915
Reviewed-by: Ian Lance Taylor <[email protected]>
Run-TryBot: Russ Cox <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-on: https://go-review.googlesource.com/17141
@golang golang locked and limited conversation to collaborators Nov 27, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants