runtime: print all threads in GOTRACEBACK >= all #13161

aclements · 2015-11-05T15:48:20Z

Currently, GOTRACEBACK=all is a misnomer. It prints stacks for all goroutines that happen to be non-running or running on the current OS thread, but it does not print stacks for goroutines that are running on other OS threads. This is frustrating. For purely internal reasons, it's currently necessary to set GOTRACEBACK=crash in order to get stacks for goroutines on other threads, but that also gets you runtime frames and an abort at the end, which is often undesirable.

We should make GOTRACEBACK=all (or higher) print stacks for all goroutines, regardless of what thread they're running on. This will make "all" do what it says in the name and will make the only difference between "system" and "crash" be whether or not it aborts at the end of the traceback.

In other words, this is the current behavior of the GOTRACEBACK settings:

	none	single	all	system	crash
show user frames	N	Y	Y	Y	Y
show runtime frames	N	N	N	Y	Y
show other goroutines	N	N	Y	Y	Y
show other threads	N	N	N	N	Y
abort	N	N	N	N	Y

This is what it should be:

	none	single	all	system	crash
show user frames	N	Y	Y	Y	Y
show runtime frames	N	N	N	Y	Y
show other goroutines	N	N	Y	Y	Y
show other threads	N	N	Y	Y	Y
abort	N	N	N	N	Y

With this, we would eliminate the distinction between "show other goroutines" and "show other threads", and each GOTRACEBACK level would enable exactly one additional feature.

We could do this using the same signal hand-off mechanism GOTRACEBACK=crash currently uses to interrupt the other threads. Historically we couldn't do this because this mechanism wasn't entirely robust, but it's been improved to the point where it should be reliable.

/cc @rsc @ianlancetaylor @randall77

bradfitz · 2016-05-05T17:11:43Z

@aclements, please decide if this is happening now or kick it down the road and update its milestone.

aclements · 2016-05-05T23:26:07Z

Too invasive for the freeze. Commencing kicking.

rsc · 2016-10-20T00:01:00Z

I looked into this for an hour or so. The mechanism does not seem quite robust enough still. After arranging for GOTRACEBACK=all to pass a SIGQUIT around like in GOTRACEBACK=crash, I only managed to find the missing goroutine running on another thread about half the time. I assume the other half of the time it had stopped by the time the SIGQUIT came in. We will probably need to be more aggressive about searching for the missing goroutines in order to be sure to print them all. And doing so will require cleaning up the SIGQUIT token passing a bit, so I'm not posting my code here. It was awful.

Test program:

package main

import "time"

func main() {
    for i := 0; i < 3; i++ {
        go func() {
            select {}
        }()
    }
    go func() {
        for {
        }
    }()
    time.Sleep(2 * time.Millisecond)
    panic(1)
}

Goroutine 7 is the one that only shows up half the time.

mknyszek · 2020-03-11T14:58:05Z

Now that we have #24543 (asynchronous preemption) we could make it so that GOTRACEBACK=crash is the default, and add a check in the signal handler if we're trying to freeze-the-world, and let asynchronous preemption preempt even at an unsafe-point in that case. This would mean that we don't need the existing SIGQUIT-bouncing mechanism and the tracebacks would all appear together (instead of separated by SIGQUITs and register dumps).

Also, just to reference what restarted this conversation: census-instrumentation/opencensus-go#1200 was missing a goroutine in the default traceback, and folks had to add GOTRACEBACK=crash.

prattmic · 2020-06-03T17:36:42Z

It should be noted that "show other threads" is not strictly "Y" for crash. It is "Y" for signal-generated panics (SIGSEGV, SIGABRT, etc), but "N" for standard panic/throw calls. This is because the SIGQUIT-bouncing mechanism is performed by the signal handler, which is bypassed by standard panic/throw.

i.e., for panic/throw, we raise SIGABRT with handlingSig[SIGABRT] = 0, and sigfwdgo immediately clears the sigaction and re-raises SIGABRT to immediately abort before sighandler can do the SIGQUIT dance.

Unless we set traceback level to crash, goroutines from all threads aren't printed. See: golang/go#13161. This is important because SIGQUIT traces will miss on goroutines. https://mattermost.atlassian.net/browse/MM-27887

aktau · 2024-10-14T08:51:29Z

It should be noted that "show other threads" is not strictly "Y" for crash. It is "Y" for signal-generated panics (SIGSEGV, SIGABRT, etc), but "N" for standard panic/throw calls. This is because the SIGQUIT-bouncing mechanism is performed by the signal handler, which is bypassed by standard panic/throw.

i.e., for panic/throw, we raise SIGABRT with handlingSig[SIGABRT] = 0, and sigfwdgo immediately clears the sigaction and re-raises SIGABRT to immediately abort before sighandler can do the SIGQUIT dance.

Even taking this into account, it would already be an improvement if GOTRACEBACK=all/system showed other threads for signal-generated crashes. How hard would it be to do this?

rsc modified the milestones: Go1.7, Go1.7Early Dec 28, 2015

bradfitz assigned aclements May 5, 2016

aclements modified the milestones: Go1.8Early, Go1.7Early May 5, 2016

quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 30, 2016

rsc modified the milestones: Go1.9Early, Go1.8Early Oct 20, 2016

bradfitz modified the milestones: Go1.9Maybe, Go1.9Early May 3, 2017

aclements modified the milestones: Go1.10, Go1.9Maybe Jul 18, 2017

rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

rsc mentioned this issue Nov 29, 2017

testing: stack unavailable when a parallel test times out #18042

Closed

aclements modified the milestones: Go1.11, Unplanned Jul 3, 2018

navytux mentioned this issue Feb 5, 2019

Tests get stuck. hanwen/go-fuse#261

Closed

agnivade mentioned this issue Aug 17, 2020

MM-27887: set traceback to crash from code mattermost/mattermost#15274

Merged

rsc unassigned aclements Jun 23, 2022

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

gabyhelp mentioned this issue Jun 11, 2024

proposal: runtime: (optionally) print M backtrace on crash #67929

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: print all threads in GOTRACEBACK >= all #13161

runtime: print all threads in GOTRACEBACK >= all #13161

aclements commented Nov 5, 2015

bradfitz commented May 5, 2016

aclements commented May 5, 2016

rsc commented Oct 20, 2016

mknyszek commented Mar 11, 2020

prattmic commented Jun 3, 2020

aktau commented Oct 14, 2024 •

edited

Loading

runtime: print all threads in GOTRACEBACK >= all #13161

runtime: print all threads in GOTRACEBACK >= all #13161

Comments

aclements commented Nov 5, 2015

bradfitz commented May 5, 2016

aclements commented May 5, 2016

rsc commented Oct 20, 2016

mknyszek commented Mar 11, 2020

prattmic commented Jun 3, 2020

aktau commented Oct 14, 2024 • edited Loading

aktau commented Oct 14, 2024 •

edited

Loading