-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: print all threads in GOTRACEBACK >= all #13161
Comments
@aclements, please decide if this is happening now or kick it down the road and update its milestone. |
Too invasive for the freeze. Commencing kicking. |
I looked into this for an hour or so. The mechanism does not seem quite robust enough still. After arranging for GOTRACEBACK=all to pass a SIGQUIT around like in GOTRACEBACK=crash, I only managed to find the missing goroutine running on another thread about half the time. I assume the other half of the time it had stopped by the time the SIGQUIT came in. We will probably need to be more aggressive about searching for the missing goroutines in order to be sure to print them all. And doing so will require cleaning up the SIGQUIT token passing a bit, so I'm not posting my code here. It was awful. Test program:
Goroutine 7 is the one that only shows up half the time. |
Now that we have #24543 (asynchronous preemption) we could make it so that Also, just to reference what restarted this conversation: census-instrumentation/opencensus-go#1200 was missing a goroutine in the default traceback, and folks had to add |
It should be noted that "show other threads" is not strictly "Y" for crash. It is "Y" for signal-generated panics (SIGSEGV, SIGABRT, etc), but "N" for standard panic/throw calls. This is because the SIGQUIT-bouncing mechanism is performed by the signal handler, which is bypassed by standard panic/throw. i.e., for panic/throw, we raise SIGABRT with |
Unless we set traceback level to crash, goroutines from all threads aren't printed. See: golang/go#13161. This is important because SIGQUIT traces will miss on goroutines. https://mattermost.atlassian.net/browse/MM-27887
Unless we set traceback level to crash, goroutines from all threads aren't printed. See: golang/go#13161. This is important because SIGQUIT traces will miss on goroutines. https://mattermost.atlassian.net/browse/MM-27887
Unless we set traceback level to crash, goroutines from all threads aren't printed. See: golang/go#13161. This is important because SIGQUIT traces will miss on goroutines. https://mattermost.atlassian.net/browse/MM-27887
Unless we set traceback level to crash, goroutines from all threads aren't printed. See: golang/go#13161. This is important because SIGQUIT traces will miss on goroutines. https://mattermost.atlassian.net/browse/MM-27887
Even taking this into account, it would already be an improvement if |
Currently, GOTRACEBACK=all is a misnomer. It prints stacks for all goroutines that happen to be non-running or running on the current OS thread, but it does not print stacks for goroutines that are running on other OS threads. This is frustrating. For purely internal reasons, it's currently necessary to set GOTRACEBACK=crash in order to get stacks for goroutines on other threads, but that also gets you runtime frames and an abort at the end, which is often undesirable.
We should make GOTRACEBACK=all (or higher) print stacks for all goroutines, regardless of what thread they're running on. This will make "all" do what it says in the name and will make the only difference between "system" and "crash" be whether or not it aborts at the end of the traceback.
In other words, this is the current behavior of the GOTRACEBACK settings:
This is what it should be:
With this, we would eliminate the distinction between "show other goroutines" and "show other threads", and each GOTRACEBACK level would enable exactly one additional feature.
We could do this using the same signal hand-off mechanism GOTRACEBACK=crash currently uses to interrupt the other threads. Historically we couldn't do this because this mechanism wasn't entirely robust, but it's been improved to the point where it should be reliable.
/cc @rsc @ianlancetaylor @randall77
The text was updated successfully, but these errors were encountered: