APM Server hung in high CPU utilization #6642

bryce-b · 2021-11-16T21:53:06Z

APM Server version (apm-server version): 7.16.0-SNAPSHOT

Description of the problem including expected versus actual behavior:
When testing Tail-based sampling, the APM Server got into a high cpu-utilization that couldn't be mitigated. This occurred while editing and updating the apm server configs. I thought it might have been associated with policy configs, but it occurred a second time on a new deployment editing different config values.

To fix the high cpu rate, I tried restarting the containers, updating the config, disabling TBS, several times, and the cpu usage did not change.

The two deployments are still available for debugging:
7f3939f
6de56fc

The text was updated successfully, but these errors were encountered:

axw · 2022-01-24T04:13:05Z

I think we should wait for elastic/kibana#121534 to test this in ESS again.

axw · 2022-02-04T08:23:28Z

I think I have reproduced this locally, and I suspect this and #6639 are closely related. In one test run I noticed an increased search rate which hasn't dropped back to baseline, and a high level of CPU consumption.

I'll have to instrument the server to find out what's going on.

axw · 2022-02-04T08:44:08Z

Also, I reproduced the issue like this:

Started stack (docker-compose up -d), enabled internal stack monitoring
Using runapm: make policy name and reinstall flags #7197, I ran two APM Servers:
1. go run ./systemtest/cmd/runapm/main.go -f -var tail_sampling_enabled=true -var tail_sampling_policies='[{"sample_rate":0.5}]'
2. go run ./cmd/runapm/main.go -policy=runapm2 -f -reinstall=false -var tail_sampling_enabled=true -var tail_sampling_policies='[{"sample_rate":0.5}]'
Ran the below program 5 times; waited for docs to be indexed (watching Elasticsearch stack monitoring); repeat once more
Observe search rate increase and never revert; observe increased CPU

package main

import (
        "os"

        "go.elastic.co/apm"
        "go.elastic.co/apm/transport"
)

func main() {
        os.Setenv("ELASTIC_APM_SERVER_URL", "http://localhost:49160")
        transport1, _ := transport.NewHTTPTransport()
        tracer1, _ := apm.NewTracerOptions(apm.TracerOptions{
                ServiceName: "svc1",
                Transport:   transport1,
        })
        defer tracer1.Flush(nil)

        os.Setenv("ELASTIC_APM_SERVER_URL", "http://localhost:49162")
        transport2, _ := transport.NewHTTPTransport()
        tracer2, _ := apm.NewTracerOptions(apm.TracerOptions{
                ServiceName: "svc2",
                Transport:   transport2,
        })
        defer tracer2.Flush(nil)

        for i := 0; i < 500; i++ {
                tx1 := tracer1.StartTransaction("tx1", "type")
                span := tx1.StartSpan("span", "type", nil)
                tx2 := tracer2.StartTransactionOptions("tx2", "type", apm.TransactionOptions{
                        TraceContext: span.TraceContext(),
                })
                tx2.End()
                span.End()
                tx1.End()
        }
}

simitt · 2022-02-04T12:47:42Z

@axw assigned you since you started looking into it.

bryce-b added the bug label Nov 16, 2021

simitt mentioned this issue Dec 17, 2021

Tail Based Sampling GA #6894

Closed

21 tasks

simitt added this to the 8.1 milestone Dec 17, 2021

simitt assigned axw Feb 4, 2022

axw mentioned this issue Feb 7, 2022

sampling/pubsub: fix subscriber #7211

Merged

1 task

axw closed this as completed in #7211 Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APM Server hung in high CPU utilization #6642

APM Server hung in high CPU utilization #6642

bryce-b commented Nov 16, 2021

axw commented Jan 24, 2022

axw commented Feb 4, 2022

axw commented Feb 4, 2022 •

edited

Loading

simitt commented Feb 4, 2022

APM Server hung in high CPU utilization #6642

APM Server hung in high CPU utilization #6642

Comments

bryce-b commented Nov 16, 2021

axw commented Jan 24, 2022

axw commented Feb 4, 2022

axw commented Feb 4, 2022 • edited Loading

simitt commented Feb 4, 2022

axw commented Feb 4, 2022 •

edited

Loading