Ensure APM Server makes full usage of available CPU resources #9182

simitt · 2022-09-23T11:34:55Z

Previous benchmarks indicate that the APM Server performs better when scaling horizontally than vertically. This is a meta issue collecting steps necessary to ensure APM Server's throughput scales well when scaling the server's available CPU resources up.

Previous investigations identified the modelindexer to be a bottle neck, and two action items were identified:

Move event compression outside of the modelindexer activeMu lock #9180
Autoscale number of modelindexers to increase throughput and ensure full resource usage #9181

The improvements need to be benchmarked; the goal is to provide full resource usage and (almost) linear scaling for APM Server's up to 32GB of RAM (with aliquot CPU time) on ESS.

The text was updated successfully, but these errors were encountered:

kruskall · 2022-12-07T08:31:08Z

I've reviewed the code in the linked PRs and looked at the trace files for specific benchmarks like BenchmarkModelIndexer.
I confirm the available CPU cores are being used efficiently and tasks seem to be parallelized correctly.

I could not find any issues but I noticed the flush goroutine for the active indexers (introduced in #9318) being always busy.
A goroutine being active might slow down GC and increase memory usage. I don't think that's going to be a big issue here due to the flush timer channel but I noticed some improvements and opened a PR to reduce the cpu time spent in the flush gorutine: #9760

simitt added enhancement meta benchmarking labels Sep 23, 2022

simitt added this to the 8.6 milestone Sep 23, 2022

simitt assigned marclop Sep 23, 2022

simitt added the v8.6.0 label Oct 4, 2022

This was referenced Oct 13, 2022

beater: Set semaphore, modelindexer chan from mem #9358

Merged

apmbench: Increase waitInactiveTimeout to 60s #9363

Merged

marclop closed this as completed in #9358 Nov 4, 2022

marclop added the test-plan label Nov 14, 2022

axw unassigned marclop Nov 21, 2022

This was referenced Nov 24, 2022

Move event compression outside of the modelindexer activeMu lock #9180

Closed

modelindexer: Run active in dedicated goroutine #9318

Merged

modelindexer: Scale active indexers based on load #9393

Merged

axw mentioned this issue Dec 6, 2022

Autoscale number of modelindexers to increase throughput and ensure full resource usage #9181

Closed

kruskall self-assigned this Dec 6, 2022

kruskall added the test-plan-ok label Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure APM Server makes full usage of available CPU resources #9182

Ensure APM Server makes full usage of available CPU resources #9182

simitt commented Sep 23, 2022

kruskall commented Dec 7, 2022 •

edited

Loading

Ensure APM Server makes full usage of available CPU resources #9182

Ensure APM Server makes full usage of available CPU resources #9182

Comments

simitt commented Sep 23, 2022

kruskall commented Dec 7, 2022 • edited Loading

kruskall commented Dec 7, 2022 •

edited

Loading