Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: print all threads in GOTRACEBACK >= all #13161

Open
aclements opened this issue Nov 5, 2015 · 6 comments
Open

runtime: print all threads in GOTRACEBACK >= all #13161

aclements opened this issue Nov 5, 2015 · 6 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@aclements
Copy link
Member

Currently, GOTRACEBACK=all is a misnomer. It prints stacks for all goroutines that happen to be non-running or running on the current OS thread, but it does not print stacks for goroutines that are running on other OS threads. This is frustrating. For purely internal reasons, it's currently necessary to set GOTRACEBACK=crash in order to get stacks for goroutines on other threads, but that also gets you runtime frames and an abort at the end, which is often undesirable.

We should make GOTRACEBACK=all (or higher) print stacks for all goroutines, regardless of what thread they're running on. This will make "all" do what it says in the name and will make the only difference between "system" and "crash" be whether or not it aborts at the end of the traceback.

In other words, this is the current behavior of the GOTRACEBACK settings:

none single all system crash
show user frames N Y Y Y Y
show runtime frames N N N Y Y
show other goroutines N N Y Y Y
show other threads N N N N Y
abort N N N N Y

This is what it should be:

none single all system crash
show user frames N Y Y Y Y
show runtime frames N N N Y Y
show other goroutines N N Y Y Y
show other threads N N Y Y Y
abort N N N N Y

With this, we would eliminate the distinction between "show other goroutines" and "show other threads", and each GOTRACEBACK level would enable exactly one additional feature.

We could do this using the same signal hand-off mechanism GOTRACEBACK=crash currently uses to interrupt the other threads. Historically we couldn't do this because this mechanism wasn't entirely robust, but it's been improved to the point where it should be reliable.

/cc @rsc @ianlancetaylor @randall77

@rsc rsc modified the milestones: Go1.7, Go1.7Early Dec 28, 2015
@bradfitz
Copy link
Contributor

bradfitz commented May 5, 2016

@aclements, please decide if this is happening now or kick it down the road and update its milestone.

@aclements
Copy link
Member Author

Too invasive for the freeze. Commencing kicking.

@aclements aclements modified the milestones: Go1.8Early, Go1.7Early May 5, 2016
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 30, 2016
@rsc
Copy link
Contributor

rsc commented Oct 20, 2016

I looked into this for an hour or so. The mechanism does not seem quite robust enough still. After arranging for GOTRACEBACK=all to pass a SIGQUIT around like in GOTRACEBACK=crash, I only managed to find the missing goroutine running on another thread about half the time. I assume the other half of the time it had stopped by the time the SIGQUIT came in. We will probably need to be more aggressive about searching for the missing goroutines in order to be sure to print them all. And doing so will require cleaning up the SIGQUIT token passing a bit, so I'm not posting my code here. It was awful.

Test program:

package main

import "time"

func main() {
    for i := 0; i < 3; i++ {
        go func() {
            select {}
        }()
    }
    go func() {
        for {
        }
    }()
    time.Sleep(2 * time.Millisecond)
    panic(1)
}

Goroutine 7 is the one that only shows up half the time.

@mknyszek
Copy link
Contributor

Now that we have #24543 (asynchronous preemption) we could make it so that GOTRACEBACK=crash is the default, and add a check in the signal handler if we're trying to freeze-the-world, and let asynchronous preemption preempt even at an unsafe-point in that case. This would mean that we don't need the existing SIGQUIT-bouncing mechanism and the tracebacks would all appear together (instead of separated by SIGQUITs and register dumps).

Also, just to reference what restarted this conversation: census-instrumentation/opencensus-go#1200 was missing a goroutine in the default traceback, and folks had to add GOTRACEBACK=crash.

@prattmic
Copy link
Member

prattmic commented Jun 3, 2020

It should be noted that "show other threads" is not strictly "Y" for crash. It is "Y" for signal-generated panics (SIGSEGV, SIGABRT, etc), but "N" for standard panic/throw calls. This is because the SIGQUIT-bouncing mechanism is performed by the signal handler, which is bypassed by standard panic/throw.

i.e., for panic/throw, we raise SIGABRT with handlingSig[SIGABRT] = 0, and sigfwdgo immediately clears the sigaction and re-raises SIGABRT to immediately abort before sighandler can do the SIGQUIT dance.

agnivade added a commit to mattermost/mattermost that referenced this issue Aug 17, 2020
Unless we set traceback level to crash, goroutines from all threads aren't printed.
See: golang/go#13161.

This is important because SIGQUIT traces will miss on goroutines.

https://mattermost.atlassian.net/browse/MM-27887
agnivade added a commit to mattermost/mattermost that referenced this issue Aug 17, 2020
Unless we set traceback level to crash, goroutines from all threads aren't printed.
See: golang/go#13161.

This is important because SIGQUIT traces will miss on goroutines.

https://mattermost.atlassian.net/browse/MM-27887
KrishnaSindhur pushed a commit to KrishnaSindhur/mattermost-server that referenced this issue Oct 10, 2020
Unless we set traceback level to crash, goroutines from all threads aren't printed.
See: golang/go#13161.

This is important because SIGQUIT traces will miss on goroutines.

https://mattermost.atlassian.net/browse/MM-27887
genarocoronel pushed a commit to genarocoronel/GoLang-mattermostserver that referenced this issue Jun 9, 2021
Unless we set traceback level to crash, goroutines from all threads aren't printed.
See: golang/go#13161.

This is important because SIGQUIT traces will miss on goroutines.

https://mattermost.atlassian.net/browse/MM-27887
@aktau
Copy link
Contributor

aktau commented Oct 14, 2024

It should be noted that "show other threads" is not strictly "Y" for crash. It is "Y" for signal-generated panics (SIGSEGV, SIGABRT, etc), but "N" for standard panic/throw calls. This is because the SIGQUIT-bouncing mechanism is performed by the signal handler, which is bypassed by standard panic/throw.

i.e., for panic/throw, we raise SIGABRT with handlingSig[SIGABRT] = 0, and sigfwdgo immediately clears the sigaction and re-raises SIGABRT to immediately abort before sighandler can do the SIGQUIT dance.

Even taking this into account, it would already be an improvement if GOTRACEBACK=all/system showed other threads for signal-generated crashes. How hard would it be to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

8 participants