Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump stacktrace upon SIGABRT #3366

Merged
merged 2 commits into from
Sep 9, 2024
Merged

Conversation

yacovm
Copy link
Contributor

@yacovm yacovm commented Sep 5, 2024

This commit makes the node process listen to SIGABRT signal and print its stack trace without terminating.

Why this should be merged

It can be useful to troubleshoot and investigate issues either in production or in development without bringing down the node.

How this works

Using the signal.Notify function, we register to SIGABRT signals and print the stack trace of all goroutines upon a reception of the signal.

How this was tested

I ran the node locally, and issued a SIGABRT signal using kill -SIGABRT.
The node wrote the stack trace of its goroutines without exiting:

[09-06|01:29:48.315] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 190022, "numTotalBlocks": 17120034, "eta": "15m43s"}
[09-06|01:29:48.715] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 195011, "numTotalBlocks": 17120034, "eta": "15m53s"}
[09-06|01:29:49.119] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 200004, "numTotalBlocks": 17120034, "eta": "16m3s"}
[09-06|01:29:49.531] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 205136, "numTotalBlocks": 17120034, "eta": "16m13s"}
Writing goroutine stack to stderr:
goroutine 484 [running]:
github.com/ava-labs/avalanchego/utils.GetStacktrace(...)
        /Users/yacov.manevich/avalanchego/utils/stacktrace.go:10
github.com/ava-labs/avalanchego/app.Run.func2()
        /Users/yacov.manevich/avalanchego/app/app.go:114 +0xd5
created by github.com/ava-labs/avalanchego/app.Run in goroutine 1
        /Users/yacov.manevich/avalanchego/app/app.go:112 +0x19b

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc001d40120?)
        /usr/local/go/src/runtime/sema.go:62 +0x25
sync.(*WaitGroup).Wait(0xc0000061c0?)
        /usr/local/go/src/sync/waitgroup.go:116 +0x48
github.com/ava-labs/avalanchego/app.(*app).ExitCode(0xc0011c5180)
        /Users/yacov.manevich/avalanchego/app/app.go:186 +0x25
github.com/ava-labs/avalanchego/app.Run({0x103fe0000, 0xc0011c5180})
        /Users/yacov.manevich/avalanchego/app/app.go:119 +0x1ab
main.main()
        /Users/yacov.manevich/avalanchego/main/main.go:70 +0x528

goroutine 66 [chan receive]:
…
…
…
net.(*netFD).Read(...)
        /usr/local/go/src/net/fd_posix.go:55
net.(*conn).Read(0xc003b0e048, {0xc005c5c000?, 0x0?, 0xd?})
        /usr/local/go/src/net/net.go:185 +0x48
crypto/tls.(*atLeastReader).Read(0xc00414d448, {0xc005c5c000?, 0x0?, 0xc00414d448?})
        /usr/local/go/src/c[09-06|01:29:50.006] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 210748, "numTotalBlocks": 17120034, "eta": "16m25s"}
[09-06|01:29:50.307] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 215063, "numTotalBlocks": 17120034, "eta": "16m28s"}
[09-06|01:29:50.698] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 220037, "numTotalBlocks": 17120034, "eta": "16m36s"}
[09-06|01:29:51.105] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 225018, "numTotalBlocks": 17120034, "eta": "16m44s"}
[09-06|01:29:51.501] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 230031, "numTotalBlocks": 17120034, "eta": "16m51s"}
[09-06|01:29:51.883] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 235013, "numTotalBlocks": 17120034, "eta": "16m57s"}
[09-06|01:29:52.340] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 240104, "numTotalBlocks": 17120034, "eta": "17m7s"}
[09-06|01:29:52.704] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 245208, "numTotalBlocks": 17120034, "eta": "17m10s"}
[09-06|01:29:52.990] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 250057, "numTotalBlocks": 17120034, "eta": "17m9s"}
[09-06|01:29:53.357] INFO <P Chain> bootstrap/bootstrapper.go:603 fetching blocks {"numFetchedBlocks": 255006, "numTotalBlocks": 17120034, "eta": "17m13s"}

@yacovm yacovm self-assigned this Sep 6, 2024
app/app.go Outdated Show resolved Hide resolved
app/app.go Outdated Show resolved Hide resolved
app/app.go Outdated Show resolved Hide resolved
app/app.go Show resolved Hide resolved
This commit makes the node process listen to SIGABRT signal and print its stack trace without terminating.
It can be useful to troubleshoot and investigate issues either in production or in development without bringing down the node.

Signed-off-by: Yacov Manevich <[email protected]>
@StephenButtolph StephenButtolph changed the title print stack trace to stderr upon SIGABRT Dump stacktrace upon SIGABRT Sep 9, 2024
@StephenButtolph StephenButtolph added this pull request to the merge queue Sep 9, 2024
Merged via the queue into ava-labs:master with commit 89f83c2 Sep 9, 2024
20 of 21 checks passed
michaelkaplan13 pushed a commit that referenced this pull request Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants