-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
obs: export metrics about Go GC Assist work #88178
Comments
golang/go#55159 is a start here. If we can land that upstream, we can define a timeseries metric over this value. Ideally, we'd also be able to set up some alerting on it to detect when goroutines are over-assisting (e.g. |
It turns out that https://go-review.googlesource.com/c/go/+/404307 landed 3 days ago and added a new That may be a good time to replace calls to |
Doing so will avoid the stop-the-world pause during calls to |
This commit introduced functions to extract exposed metrics in go runtime metrics api. The runtime metrics is sampled along in SampleEnvironment call every 10 seconds. New metric GcAssitSecond is captured in this pr to capture represent an estimate to amount of effort of go routines assist in gc activities. Fixes: cockroachdb#88178 Relase note: None
This commit introduced functions to extract exposed metrics in go runtime metrics api. The runtime metrics is sampled along in SampleEnvironment call every 10 seconds. New metric GcAssistNS is captured in this pr to capture represent an estimate to amount of effort of user go routines assist in gc activities in nanoseconds. Fixes: cockroachdb#88178 Release note: None
@lyang24 determined that the mapping for this will look like:
|
118875: obs: export metrics about Go GC Assist work r=lyang24 a=lyang24 This commit introduced functions to extract exposed metrics in go runtime metrics api. The runtime metrics is sampled along in SampleEnvironment call every 10 seconds. New metric GcAssistNS is added as an estimate to amount of effort of user go routines assist in gc activities in nanoseconds. Fixes: #88178 Relase note: None Co-authored-by: lyang24 <[email protected]>
I made a mistake on GoAllocBytes after testing need to update mapping to /gc/heap/allocs:bytes is the total allocated bytes and this number will not decrease on release |
A few recent support escalations would have been much easier to debug with this information exposed. |
This commit introduced functions to extract exposed metrics in go runtime metrics api. The runtime metrics is sampled along in SampleEnvironment call every 10 seconds. New metric GcAssistNS is added as an estimate to amount of effort of user go routines assist in gc activities in nanoseconds. Fixes: cockroachdb#88178 Release note: None
CockroachDB currently exports some timeseries metrics about the Go runtime memory GC:
sys.gc.pause.percent
: "Current GC pause percentage"sys.gc.pause.ns
: "Total GC pause"sys.gc.count
: "Total number of GC runs"These metrics are computed using
runtime.ReadGCStats
. They are useful to understand how long the Go GC is stopping the world ("pause" refers to STW work, not background work).However, GO GC has other costs beyond its STW sweep termination and mark termination phases. In general, the concurrent mark and scan phase can be run without pushing back on foreground goroutines. However, when goroutines are allocating memory faster than GC can clean up (either because of significant memory allocation, slow GC, or both), GC work can be pushed back on foreground goroutines in line with their heap allocations. This is known as "GC Assist".
We've seen in cases like this one that GC assist can lead to large spikes in latency that are difficult to understand using other observability tools.
We should find a way to expose this information. Unfortunately, this is not exported by the go runtime, except through the
GODEBUG=gctrace
tooling. We may need to patch the runtime or upstream a fix to get at the information programmatically.Jira issue: CRDB-19718
Epic CRDB-34227
The text was updated successfully, but these errors were encountered: