-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APM Server hung in high CPU utilization #6642
Comments
I think we should wait for elastic/kibana#121534 to test this in ESS again. |
I think I have reproduced this locally, and I suspect this and #6639 are closely related. In one test run I noticed an increased search rate which hasn't dropped back to baseline, and a high level of CPU consumption. I'll have to instrument the server to find out what's going on. |
Also, I reproduced the issue like this:
package main
import (
"os"
"go.elastic.co/apm"
"go.elastic.co/apm/transport"
)
func main() {
os.Setenv("ELASTIC_APM_SERVER_URL", "http://localhost:49160")
transport1, _ := transport.NewHTTPTransport()
tracer1, _ := apm.NewTracerOptions(apm.TracerOptions{
ServiceName: "svc1",
Transport: transport1,
})
defer tracer1.Flush(nil)
os.Setenv("ELASTIC_APM_SERVER_URL", "http://localhost:49162")
transport2, _ := transport.NewHTTPTransport()
tracer2, _ := apm.NewTracerOptions(apm.TracerOptions{
ServiceName: "svc2",
Transport: transport2,
})
defer tracer2.Flush(nil)
for i := 0; i < 500; i++ {
tx1 := tracer1.StartTransaction("tx1", "type")
span := tx1.StartSpan("span", "type", nil)
tx2 := tracer2.StartTransactionOptions("tx2", "type", apm.TransactionOptions{
TraceContext: span.TraceContext(),
})
tx2.End()
span.End()
tx1.End()
}
} |
@axw assigned you since you started looking into it. |
APM Server version (
apm-server version
): 7.16.0-SNAPSHOTDescription of the problem including expected versus actual behavior:
When testing Tail-based sampling, the APM Server got into a high cpu-utilization that couldn't be mitigated. This occurred while editing and updating the apm server configs. I thought it might have been associated with
policy
configs, but it occurred a second time on a new deployment editing different config values.To fix the high cpu rate, I tried restarting the containers, updating the config, disabling TBS, several times, and the cpu usage did not change.
The two deployments are still available for debugging:
7f3939f
6de56fc
The text was updated successfully, but these errors were encountered: