Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: Log goroutines (stack traces) on SIGUSR1 #471

Merged
merged 1 commit into from
Jul 31, 2023

Conversation

mitjat
Copy link
Contributor

@mitjat mitjat commented Jul 1, 2023

Task

Produce core dump (goroutines stack trace) on demand

Perhaps a custom handler for SIGUSR1 that dumps the core into a file and/or logs it.

(For services (= "the API"), there's a fairly standard way in Go to add an endpoint, but I'm not sure what the best way would be to expose the endpoint only internally. Might not be worth the additional hassle; we've never needed a core dump there.)

Motivation:

Currently, the only way to obtain the stacks of goroutines is to send a SIGQUIT, which also terminates the program. And in a way that does not safely close the KVStore 💀, at that. When analyzers get stuck, the list of goroutines can be very helpful for diagnosing the live system.

This PR

This PR maxes the nexus binary respond to the SIGUSR1 os signal by logging the stack traces of all goroutines.

A good way to read the stack traces is to pass the (last minute of) logs through on of:

| jq -r  '.goroutines_all | select(.)'
| jq -r  '.goroutines_block | select(.)'
| jq -r  '.goroutines_mutex | select(.)'

Tested locally.

@mitjat mitjat force-pushed the mitjat/stack-trace-on-demand branch 2 times, most recently from d89d48f to ae7c8d4 Compare July 3, 2023 05:42
@pro-wh
Copy link
Collaborator

pro-wh commented Jul 3, 2023

do we need to tell the go runtime that we don't have to wait for this goroutine to exit before exiting?

@mitjat
Copy link
Contributor Author

mitjat commented Jul 25, 2023

do we need to tell the go runtime that we don't have to wait for this goroutine to exit before exiting?

I don't think so. It hasn't come up in my testing. I believe Go doesn't have an implicit all_goroutines.Wait() appended to the end of main(). See also this mini demo; it prints "Program exited" immediately.

Copy link
Collaborator

@pro-wh pro-wh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

cmd/root.go Outdated Show resolved Hide resolved
@mitjat mitjat force-pushed the mitjat/stack-trace-on-demand branch from ae7c8d4 to b88e311 Compare July 31, 2023 11:01
@mitjat mitjat enabled auto-merge July 31, 2023 11:01
@mitjat mitjat merged commit 7530747 into main Jul 31, 2023
@mitjat mitjat deleted the mitjat/stack-trace-on-demand branch July 31, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants