modelindexer: Scale active indexers based on load #9393

marclop · 2022-10-18T07:55:35Z

Motivation/summary

Modifies the modelindexer to scale up and down the number of "active" indexers (goroutines pulling out of the internal modelindexer queue), based on consecutive number of flushes up to 25% of the GOMAXPROCS.

When N number of full flushes (Reached or exceeded FlushBytes) occur, a new active indexer will be created by the scale action, as long as the scaling cooldown has elapsed.

Equally, when N number of timed flushes (due to FlushInterval) occur, an active indexer will be scaled down, since not enough load is going through the server to warrant the current number of active indexers.

Active indexer downscaling can also be triggered due to a change in the GOMAXPROCS. This is particularly important for containerized or cgroup environments where CPU quotas may be updated "live". In this case, the downscale cooldown is ignored and active indexers are scaled down until until active <= math.RoundToEven(GOMAXPROCS / 4).

When traffic isn't going through an active indexer, a new timer has been introduced to allow completely idle indexers to be scaled back. The idle check interval can be configured via IdleInterval.

Scaling is enabled by default, but can be disabled via the configuration option output.elasticsearch.scaling.enabled: false.

Last, modifies the default settings for max_requests and flush_bytes to be 50 and 1MB respectively. This allows smaller payloads to be sent to Elasticsearch, more available indexers can be used by instances with more processing power, and the indexers are cycled faster, which results in better usage and performance.

Checklist

Update CHANGELOG.asciidoc
~~- [ ] Update package changelog.yml (only if changes to apmpackage have been made)~~
Documentation has been updated

How to test these changes

Run benchmarks on instances 8g and bigger in ESS and observe CPU utilization reaching ~100%.

Related issues

Closes #9181

Modifies the `modelindexer` to scale up and down the number of "active" indexers (goroutines pulling out of the internal modelindexer queue), based on consecutive number of flushes up to 20% of the `GOMAXPROCS`. When N number of full flushes (Reached or exceeded `FlushBytes`) occur, a new active indexer will be created by the scale action, as long as the scaling cooldown has elapsed. Equally, when N number of timed flushes (due to `FlushInterval`) occur, an active indexer will be scaled down, since not enough load is going through the server to warrant the current number of active indexers. Active indexer downscaling can also be triggered due to a change in the `GOMAXPROCS`. This is particularly important for containerized or cgroup environments where CPU quotas may be updated "live". In this case, the downscale cooldown is ignored and active indexers are scaled down until until `active <= math.RoundToEven(GOMAXPROCS / 4)`. When traffic isn't going through an active indexer, a new timer has been introduced to allow completely idle indexers to be scaled back. The idle check interval can be configured via `IdleInterval`. Scaling is enabled by default, but can be disabled via the configuration option `output.elasticsearch.scaling.disabled: true`. Last, modifies the default settings for `max_requests` and `flush_bytes` to be `50` and `1MB` respectively. This allows smaller payloads to be sent to Elasticsearch, more available indexers can be used by instances with more processing power, and the indexers are cycled faster, which results in better usage and performance. Signed-off-by: Marc Lopez Rubio <[email protected]>

apmmachine · 2022-10-18T08:23:34Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-10-26T10:53:41.601+0000
Duration: 27 min 55 sec

Test stats 🧪

Test	Results
Failed	0
Passed	154
Skipped	0
Total	154

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate and publish the docker images.
/test windows : Build & tests on Windows.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

apmmachine · 2022-10-18T08:23:38Z

📚 Go benchmark report

Diff with the main branch

name                                                                                              old time/op    new time/op    delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
FetchAndAdd/FetchFromCache-12                                                                       41.2ns ± 1%    46.1ns ± 0%   +11.97%  (p=0.016 n=5+4)
FetchAndAdd/FetchAndAddToCache-12                                                                   92.0ns ± 1%   102.2ns ± 0%   +11.07%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
ContextResetContentEncoding/empty-12                                                                 117ns ± 1%     132ns ± 1%   +12.27%  (p=0.008 n=5+5)
ContextResetContentEncoding/uncompressed-12                                                          151ns ± 0%     170ns ± 1%   +12.08%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessor/invalid-metadata-2.ndjson-12                                                       7.86µs ± 7%    6.98µs ± 8%   -11.15%  (p=0.032 n=5+5)
BackendProcessor/unknown-span-type.ndjson-12                                                        76.2µs ± 8%    60.9µs ±23%   -20.02%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_rum.ndjson-12                    7.39µs ± 3%    7.86µs ± 5%    +6.28%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_transaction_id.ndjson-12         18.5µs ± 9%    21.5µs ± 9%   +15.99%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/otel-bridge.ndjson-12                   10.5µs ± 5%    11.5µs ± 9%    +9.99%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/ratelimit.ndjson-12                     18.0µs ± 4%    19.4µs ± 3%    +8.18%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/invalid-event-type.ndjson-12            1.86µs ± 7%    2.32µs ± 3%   +25.09%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions.ndjson-12                  10.9µs ± 2%    11.1µs ± 1%    +1.66%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans.ndjson-12            10.7µs ± 1%    11.1µs ± 2%    +4.14%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum.ndjson-12        1.97µs ± 1%    2.04µs ± 1%    +3.82%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum_2.ndjson-12      1.91µs ± 1%    1.98µs ± 1%    +3.60%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/unknown-span-type.ndjson-12             7.13µs ± 1%    7.27µs ± 1%    +1.98%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors.ndjson-12                      6.87µs ± 2%    7.35µs ± 2%    +6.94%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event-type.ndjson-12           784ns ± 1%     802ns ± 2%    +2.29%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event.ndjson-12               3.28µs ± 1%    3.33µs ± 1%    +1.59%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-json-event.ndjson-12          1.08µs ± 1%    1.10µs ± 1%    +1.80%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata-2.ndjson-12           480ns ± 2%     488ns ± 1%    +1.82%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata.ndjson-12             484ns ± 1%     495ns ± 1%    +2.23%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/otel-bridge.ndjson-12                 3.13µs ± 2%    3.21µs ± 1%    +2.50%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans_rum.ndjson-12      1.57µs ± 1%    1.60µs ± 2%    +1.88%  (p=0.032 n=5+5)
ReadBatch/invalid-json-metadata.ndjson-12                                                           43.2µs ±11%    32.7µs ±29%   -24.34%  (p=0.032 n=5+5)
ReadBatch/optional-timestamps.ndjson-12                                                             18.3µs ±18%    13.1µs ±13%   -28.18%  (p=0.008 n=5+5)
ReadBatch/spans.ndjson-12                                                                            153µs ± 5%     114µs ±32%   -25.30%  (p=0.008 n=5+5)
ReadBatch/transactions.ndjson-12                                                                     126µs ±15%      98µs ±16%   -21.75%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
AggregateTransaction-12                                                                             84.0ns ± 0%    84.7ns ± 0%    +0.78%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64
WriteTransaction/json_codec-12                                                                      4.25µs ± 6%   12.41µs ± 3%  +191.92%  (p=0.016 n=5+4)
WriteTransaction/json_codec_big_tx-12                                                               4.93µs ± 2%   15.14µs ±43%  +207.00%  (p=0.008 n=5+5)
ReadEvents/json_codec/0_events-12                                                                    319ns ± 8%     352ns ± 7%   +10.28%  (p=0.032 n=5+5)
ReadEvents/json_codec_big_tx/0_events-12                                                             313ns ± 8%     348ns ± 7%   +11.16%  (p=0.032 n=5+5)
ReadEvents/nop_codec/0_events-12                                                                     310ns ± 6%     337ns ± 4%    +8.68%  (p=0.032 n=5+5)
ReadEvents/nop_codec_big_tx/0_events-12                                                              306ns ± 4%     339ns ± 7%   +10.94%  (p=0.008 n=5+5)
IsTraceSampled/sampled-12                                                                           67.8ns ± 1%    74.0ns ± 3%    +9.15%  (p=0.008 n=5+5)
IsTraceSampled/unsampled-12                                                                         68.6ns ± 5%    76.4ns ± 1%   +11.30%  (p=0.008 n=5+5)
IsTraceSampled/unknown-12                                                                            377ns ± 2%     414ns ± 1%    +9.79%  (p=0.008 n=5+5)

name                                                                                              old alloc/op   new alloc/op   delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
ModelIndexer/NoCompression-12                                                                       3.00kB ± 1%    2.95kB ± 0%    -1.58%  (p=0.008 n=5+5)
ModelIndexer/BestSpeed-12                                                                           2.62kB ± 1%    2.59kB ± 1%    -1.07%  (p=0.016 n=5+5)
ModelIndexer/DefaultCompression-12                                                                  2.61kB ± 0%    2.56kB ± 1%    -1.90%  (p=0.008 n=5+5)
ModelIndexer/BestCompression-12                                                                     2.67kB ± 1%    2.58kB ± 1%    -3.03%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessor/invalid-event.ndjson-12                                                            7.70kB ± 1%    7.49kB ± 1%    -2.72%  (p=0.008 n=5+5)
BackendProcessor/invalid-metadata-2.ndjson-12                                                       2.75kB ± 1%    2.70kB ± 1%    -1.82%  (p=0.024 n=5+5)
BackendProcessor/metadata.ndjson-12                                                                 4.98kB ± 2%    4.92kB ± 1%    -1.33%  (p=0.048 n=5+5)
BackendProcessor/optional-timestamps.ndjson-12                                                      4.55kB ± 0%    4.51kB ± 1%    -0.85%  (p=0.000 n=4+5)
BackendProcessor/transactions_spans.ndjson-12                                                       24.3kB ± 2%    23.8kB ± 1%    -1.90%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/errors_rum.ndjson-12                    8.44kB ± 0%    8.66kB ± 2%    +2.57%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/invalid-event-type.ndjson-12            4.06kB ± 1%    4.15kB ± 1%    +2.22%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/optional-timestamps.ndjson-12           5.07kB ± 2%    5.13kB ± 1%    +1.10%  (p=0.024 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/errors_rum.ndjson-12                    8.37kB ± 1%    8.26kB ± 0%    -1.26%  (p=0.016 n=5+4)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum.ndjson-12        6.19kB ± 1%    6.27kB ± 1%    +1.31%  (p=0.024 n=5+5)
ReadBatch/errors.ndjson-12                                                                          20.8kB ± 0%    20.8kB ± 0%    -0.09%  (p=0.016 n=5+5)
ReadBatch/errors_2.ndjson-12                                                                        22.7kB ± 0%    22.7kB ± 0%    -0.08%  (p=0.016 n=5+5)
ReadBatch/optional-timestamps.ndjson-12                                                             3.62kB ± 0%    3.62kB ± 0%    -0.11%  (p=0.008 n=5+5)
ReadBatch/transactions.ndjson-12                                                                    25.2kB ± 0%    25.2kB ± 0%    -0.05%  (p=0.016 n=5+5)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64
WriteTransaction/json_codec-12                                                                      3.00kB ± 0%    3.00kB ± 0%    +0.05%  (p=0.008 n=5+5)
ReadEvents/json_codec_big_tx/100_events-12                                                          1.03MB ± 0%    1.03MB ± 0%    -0.07%  (p=0.016 n=5+4)

name                                                                                              old allocs/op  new allocs/op  delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64

name                                                                                              old speed      new speed      delta
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessor/invalid-metadata-2.ndjson-12                                                     55.6MB/s ± 8%  62.6MB/s ± 7%   +12.54%  (p=0.032 n=5+5)
BackendProcessor/unknown-span-type.ndjson-12                                                      43.5MB/s ± 8%  55.5MB/s ±27%   +27.49%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_rum.ndjson-12                   257MB/s ± 3%   242MB/s ± 5%    -5.88%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_transaction_id.ndjson-12        207MB/s ± 9%   179MB/s ± 9%   -13.53%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/otel-bridge.ndjson-12                  180MB/s ± 5%   164MB/s ± 8%    -8.79%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/ratelimit.ndjson-12                    235MB/s ± 4%   217MB/s ± 2%    -7.59%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/invalid-event-type.ndjson-12           211MB/s ± 8%   168MB/s ± 3%   -20.15%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions.ndjson-12                 518MB/s ± 2%   509MB/s ± 1%    -1.64%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans.ndjson-12           546MB/s ± 1%   524MB/s ± 2%    -3.97%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum.ndjson-12       587MB/s ± 1%   565MB/s ± 1%    -3.70%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum_2.ndjson-12     584MB/s ± 1%   564MB/s ± 1%    -3.49%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/unknown-span-type.ndjson-12            464MB/s ± 1%   455MB/s ± 1%    -1.95%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors.ndjson-12                     924MB/s ± 2%   864MB/s ± 2%    -6.49%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event-type.ndjson-12         498MB/s ± 1%   487MB/s ± 2%    -2.23%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event.ndjson-12              233MB/s ± 1%   230MB/s ± 1%    -1.56%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-json-event.ndjson-12         545MB/s ± 1%   536MB/s ± 1%    -1.76%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata-2.ndjson-12         909MB/s ± 2%   893MB/s ± 1%    -1.80%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata.ndjson-12           921MB/s ± 1%   901MB/s ± 1%    -2.18%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/otel-bridge.ndjson-12                601MB/s ± 2%   586MB/s ± 1%    -2.45%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans_rum.ndjson-12     735MB/s ± 1%   721MB/s ± 2%    -1.86%  (p=0.032 n=5+5)
ReadBatch/invalid-json-metadata.ndjson-12                                                         10.4MB/s ±10%  14.0MB/s ±25%   +35.16%  (p=0.032 n=5+5)
ReadBatch/optional-timestamps.ndjson-12                                                           56.8MB/s ±16%  78.9MB/s ±14%   +39.06%  (p=0.008 n=5+5)
ReadBatch/spans.ndjson-12                                                                         52.5MB/s ± 5%  72.7MB/s ±41%   +38.39%  (p=0.008 n=5+5)
ReadBatch/transactions.ndjson-12                                                                  45.2MB/s ±14%  58.0MB/s ±16%   +28.20%  (p=0.008 n=5+5)

report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

marclop · 2022-10-18T08:30:52Z

Since the benchmark diff is not very readable, here's the output from my laptop:

goos: darwin
goarch: arm64
pkg: github.com/elastic/apm-server/internal/model/modelindexer
BenchmarkModelIndexer/NoCompression-8         	  814345	      1516 ns/op	1651.20 MB/s	    2187 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompression-8         	  682668	      1529 ns/op	1637.96 MB/s	    2245 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompression-8         	  768716	      1477 ns/op	1695.04 MB/s	    2205 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompression-8         	  686592	      1476 ns/op	1696.54 MB/s	    2242 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompression-8         	  798435	      1455 ns/op	1720.88 MB/s	    2193 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  813568	      1455 ns/op	1720.78 MB/s	    2187 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  829038	      1441 ns/op	1737.72 MB/s	    2182 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  811366	      1465 ns/op	1708.66 MB/s	    2188 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  817510	      1439 ns/op	1739.75 MB/s	    2186 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  832280	      1471 ns/op	1701.95 MB/s	    2181 B/op	      21 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  519282	      2318 ns/op	1080.32 MB/s	    2564 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  519126	      2283 ns/op	1096.69 MB/s	    2564 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  515020	      2348 ns/op	1065.84 MB/s	    2566 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  500868	      2341 ns/op	1069.58 MB/s	    2564 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  521577	      2277 ns/op	1099.58 MB/s	    2563 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  595076	      1942 ns/op	1289.09 MB/s	    2570 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  614433	      1923 ns/op	1302.01 MB/s	    2568 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  609142	      1936 ns/op	1293.09 MB/s	    2567 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  599374	      1940 ns/op	1290.15 MB/s	    2567 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  600386	      1955 ns/op	1280.93 MB/s	    2567 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  426439	      2803 ns/op	 893.28 MB/s	    2533 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  423412	      2854 ns/op	 877.47 MB/s	    2533 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  426790	      2825 ns/op	 886.45 MB/s	    2533 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  424195	      2826 ns/op	 885.72 MB/s	    2532 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  424188	      2830 ns/op	 884.95 MB/s	    2532 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  520196	      2126 ns/op	1178.06 MB/s	    2536 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  534577	      2268 ns/op	1103.97 MB/s	    2537 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  530379	      2240 ns/op	1117.84 MB/s	    2535 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  518343	      2191 ns/op	1143.05 MB/s	    2536 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  521464	      2196 ns/op	1139.97 MB/s	    2536 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  233355	      4386 ns/op	 570.62 MB/s	    2572 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  235694	      4363 ns/op	 573.90 MB/s	    2568 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  233110	      4362 ns/op	 574.00 MB/s	    2571 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  235569	      4391 ns/op	 570.29 MB/s	    2569 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  234534	      4430 ns/op	 565.28 MB/s	    2569 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  382075	      3034 ns/op	 825.18 MB/s	    2572 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  384585	      2996 ns/op	 835.71 MB/s	    2570 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  382268	      3032 ns/op	 825.74 MB/s	    2572 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  381604	      3069 ns/op	 815.81 MB/s	    2572 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  386227	      3004 ns/op	 833.68 MB/s	    2570 B/op	      24 allocs/op
PASS
ok  	github.com/elastic/apm-server/internal/model/modelindexer	75.821s

And benchstat:

benchstat scaling.txt
name                                      time/op
ModelIndexer/NoCompression-8                1.49µs ± 3%
ModelIndexer/NoCompressionScaling-8         1.45µs ± 1%
ModelIndexer/BestSpeed-8                    2.31µs ± 2%
ModelIndexer/BestSpeedScaling-8             1.94µs ± 1%
ModelIndexer/DefaultCompression-8           2.83µs ± 1%
ModelIndexer/DefaultCompressionScaling-8    2.20µs ± 4%
ModelIndexer/BestCompression-8              4.39µs ± 1%
ModelIndexer/BestCompressionScaling-8       3.03µs ± 1%

name                                      speed
ModelIndexer/NoCompression-8              1.68GB/s ± 3%
ModelIndexer/NoCompressionScaling-8       1.72GB/s ± 1%
ModelIndexer/BestSpeed-8                  1.08GB/s ± 2%
ModelIndexer/BestSpeedScaling-8           1.29GB/s ± 1%
ModelIndexer/DefaultCompression-8          886MB/s ± 1%
ModelIndexer/DefaultCompressionScaling-8  1.14GB/s ± 4%
ModelIndexer/BestCompression-8             571MB/s ± 1%
ModelIndexer/BestCompressionScaling-8      827MB/s ± 1%

marclop · 2022-10-18T08:41:58Z

Benchstat diff

1 to 4g APM Server

All had similar performance to main, since the active indexers were only scaled up briefly when CPU credits were available which to significant bursts in throughput when that occurred.

8g APM Server

This one has less of a noticeable increase, but the CPU was utilized more fully. Bursts had way higher throughput.

$ benchstat -alpha 0.11 sizes/main-active/8g-12s-12n.txt sizes/scaling/8g-24s-24n1mb-50av-gomaxprocs-by-4.txt
name          old time/op                  new time/op                  delta
AgentAll-512                   606ms ± 1%                   593ms ±25%      ~     (p=0.700 n=3+3)

name          old error_responses/sec      new error_responses/sec      delta
AgentAll-512                    0.03 ±58%                    0.00       -100.00%  (p=0.100 n=3+3)

name          old events/sec               new events/sec               delta
AgentAll-512                   29.5k ± 1%                   31.5k ±28%      ~     (p=0.700 n=3+3)

name          old gc_cycles                new gc_cycles                delta
AgentAll-512                     318 ± 1%                     312 ±26%      ~     (p=0.700 n=3+3)

name          old max_goroutines           new max_goroutines           delta
AgentAll-512                     382 ± 2%                     398 ±29%      ~     (p=0.700 n=3+3)

name          old max_heap_alloc           new max_heap_alloc           delta
AgentAll-512                   1.11G ± 2%                   1.15G ± 5%      ~     (p=0.200 n=3+3)

name          old max_heap_objects         new max_heap_objects         delta
AgentAll-512                   10.0M ± 2%                   10.1M ± 6%      ~     (p=1.000 n=3+3)

name          old max_rss                  new max_rss                  delta
AgentAll-512                   1.22G ± 1%                   1.26G ± 5%      ~     (p=0.400 n=3+3)

name          old mean_available_indexers  new mean_available_indexers  delta
AgentAll-512                    16.1 ± 1%                    40.7 ± 8%  +153.35%  (p=0.100 n=3+3)

name          old alloc/op                 new alloc/op                 delta
AgentAll-512                   570MB ± 1%                   581MB ± 1%    +2.03%  (p=0.100 n=3+3)

name          old allocs/op                new allocs/op                delta
AgentAll-512                   7.99M ± 1%                   8.15M ± 1%    +2.09%  (p=0.100 n=3+3)

15g APM Server

From this size onwards is where the autoscaling really shines. CPU utilization was increased and so was throughput.

$ benchstat -alpha 0.11 sizes/main-active/15g-12s-12n.txt sizes/scaling/15g-24s-24n1mb-50av-gomaxprocs-by-4.txt
name          old time/op                  new time/op                  delta
AgentAll-960                   561ms ±10%                   336ms ± 0%   -40.08%  (p=0.100 n=3+3)

name          old error_responses/sec      new error_responses/sec      delta
AgentAll-960                    0.06 ±85%                   0.00 ±200%   -95.79%  (p=0.100 n=3+3)

name          old events/sec               new events/sec               delta
AgentAll-960                   31.8k ±10%                   53.6k ± 0%   +68.34%  (p=0.100 n=3+3)

name          old gc_cycles                new gc_cycles                delta
AgentAll-960                     317 ± 3%                     448 ± 3%   +41.28%  (p=0.100 n=3+3)

name          old max_goroutines           new max_goroutines           delta
AgentAll-960                     386 ± 3%                     562 ± 2%   +45.64%  (p=0.100 n=3+3)

name          old max_heap_alloc           new max_heap_alloc           delta
AgentAll-960                   1.11G ± 0%                   1.19G ± 1%    +7.72%  (p=0.100 n=3+3)

name          old max_heap_objects         new max_heap_objects         delta
AgentAll-960                   9.89M ± 1%                  10.67M ± 0%    +7.91%  (p=0.100 n=3+3)

name          old max_rss                  new max_rss                  delta
AgentAll-960                   1.23G ± 2%                   1.33G ± 1%    +7.84%  (p=0.100 n=3+3)

name          old mean_available_indexers  new mean_available_indexers  delta
AgentAll-960                    15.5 ± 7%                    33.4 ± 0%  +115.79%  (p=0.100 n=3+3)

name          old alloc/op                 new alloc/op                 delta
AgentAll-960                   564MB ± 1%                   571MB ± 0%    +1.20%  (p=0.100 n=3+3)

name          old allocs/op                new allocs/op                delta
AgentAll-960                   7.92M ± 1%                   8.07M ± 0%    +1.82%  (p=0.100 n=3+3)

30g APM Server

$ benchstat -alpha 0.11 sizes/main-active/30g-12s-12n.txt sizes/scaling/30g-24s-24n1mb-50av-gomaxprocs-by-4-pr.txt
name           old time/op                  new time/op                  delta
AgentAll-1920                   585ms ± 5%                   183ms ± 0%   -68.71%  (p=0.100 n=3+3)

name           old error_responses/sec      new error_responses/sec      delta
AgentAll-1920                    0.04 ±18%                    0.01 ±50%   -76.61%  (p=0.100 n=3+3)

name           old events/sec               new events/sec               delta
AgentAll-1920                   30.7k ± 5%                   98.1k ± 0%  +219.84%  (p=0.100 n=3+3)

name           old gc_cycles                new gc_cycles                delta
AgentAll-1920                     335 ± 5%                     772 ± 1%  +130.55%  (p=0.100 n=3+3)

name           old max_goroutines           new max_goroutines           delta
AgentAll-1920                     390 ± 2%                    1026 ± 1%  +163.22%  (p=0.100 n=3+3)

name           old max_heap_alloc           new max_heap_alloc           delta
AgentAll-1920                   1.09G ± 2%                   1.41G ± 1%   +29.59%  (p=0.100 n=3+3)

name           old max_heap_objects         new max_heap_objects         delta
AgentAll-1920                   9.18M ± 3%                  13.15M ± 1%   +43.31%  (p=0.100 n=3+3)

name           old max_rss                  new max_rss                  delta
AgentAll-1920                   1.23G ± 2%                   1.58G ± 0%   +28.20%  (p=0.100 n=3+3)

name           old mean_available_indexers  new mean_available_indexers  delta
AgentAll-1920                    15.8 ± 3%                    12.2 ± 7%   -22.64%  (p=0.100 n=3+3)

name           old alloc/op                 new alloc/op                 delta
AgentAll-1920                   573MB ± 0%                   560MB ± 0%    -2.19%  (p=0.100 n=3+3)

name           old allocs/op                new allocs/op                delta
AgentAll-1920                   8.00M ± 0%                   7.96M ± 0%    -0.59%  (p=0.100 n=3+3)

marclop · 2022-10-18T08:44:17Z

Metric Screenshots

The last 3 distributions are all for the same 30g APM Server instance. I ran the benchmarks 3 times with that size to gather some more data.

APM Server

Elasticsearch

464GB of RAM! Cluster over 3 zones. With a total of 24 hot nodes.

Signed-off-by: Marc Lopez Rubio <[email protected]>

axw

I have a bunch of comments and suggestions, but this is very nice :)

dev_docs/ARCHITECTURE.md

internal/beater/beater.go

internal/model/modelindexer/indexer.go

axw · 2022-10-19T06:25:45Z

internal/model/modelindexer/indexer.go

+			// When the queue utilization is below 5%, reset the idleTimer. When
+			// traffic to the APM Server is interrupted or stopped, it allows excess
+			// active indexers that have been idle for a the IdleInterval to be
+			// scaled down.
+			activeIndexers := atomic.LoadInt64(&i.activeBulkRequests)
+			lowChanCapacity := float64(len(i.bulkItems))/float64(cap(i.bulkItems)) <= 0.05
+			if lowChanCapacity && activeIndexers > 1 {
+				idleTimer.Reset(i.config.Scaling.IdleInterval)
+			}


Doing this on every iteration of the loop concerns me a little bit. Would it make sense to require FlushInterval >= IdleInterval, and only reset the idle timer before the first iteration of the loop, and then after flushing? i.e. whenever the flush timer is inactive.

Also, I think the idle timer should only be started when autoscaling is enabled?

internal/model/modelindexer/indexer_test.go

internal/model/modelindexer/indexer.go

simitt

Excited about these changes!
Left a couple of mostly nitpicks around naming and comments.

Have you tested this against an undersized ES? Given that the number of active indexers is based on events processed in the APM Server and not also on ES response and pushback, I wonder if a situation might get worse for setups where APM Server is processing more events than ES can handle, by increasing the pressure towards ES.

dev_docs/ARCHITECTURE.md

internal/beater/beater.go

dev_docs/ARCHITECTURE.md

simitt · 2022-10-19T11:55:42Z

internal/model/modelindexer/indexer.go

+	// Disabled toggles active indexer scaling on.
+	//
+	// It is enabled by default.
+	Disabled bool


Again, I'd switch to Enabled for consistency.

FWIW I think this is okay for the moment, maybe we can come back to it if it's a pain. The zero value for ScalingConfig currently means to use the default config; it should be enabled by default. So we would need to make it a *bool to preserve that while also making it "enabled".

internal/model/modelindexer/indexer.go

Signed-off-by: Marc Lopez Rubio <[email protected]>

internal/model/modelindexer/indexer.go

Signed-off-by: Marc Lopez Rubio <[email protected]>

axw

Thanks for the updates, it's looking pretty good now. Only one more comment on the metric names from me - I think the main remaining question is @simitt's one about behaviour when ES is underpowered.

internal/beater/beater.go

internal/model/modelindexer/indexer.go

Signed-off-by: Marc Lopez Rubio <[email protected]>

…scale-active-indexers

Signed-off-by: Marc Lopez Rubio <[email protected]>

marclop · 2022-10-25T13:59:48Z

@simitt @axw Thanks for the reviews. I have tested the performance of an 8GB APM Server (4vCPUs with 8 as the burstable max) and it does seem like the pressure is a bit worse since the autoscaling does take place after 60 consecutive flushes. The backing Elasticsearch cluster is a 3 zone 8GB cluster.

Comparing the current PR with main (main has 25 available indexers of 2.5MB of size):

$ benchstat -alpha 0.11 sizes/scaling/undersized/8g-2s-2n-main.txt sizes/scaling/undersized/8g-2s-2n-pr.txt
name          old time/op                  new time/op                  delta
AgentAll-512                   1.91s ± 4%                   2.01s ± 3%      ~     (p=0.200 n=3+3)

name          old error_responses/sec      new error_responses/sec      delta
AgentAll-512                    0.00                         0.00           ~     (all equal)

name          old events/sec               new events/sec               delta
AgentAll-512                   9.46k ± 4%                   8.98k ± 3%      ~     (p=0.200 n=3+3)

name          old gc_cycles                new gc_cycles                delta
AgentAll-512                     130 ± 8%                     120 ± 2%    -7.93%  (p=0.100 n=3+3)

name          old max_goroutines           new max_goroutines           delta
AgentAll-512                     232 ±10%                     294 ± 4%   +26.69%  (p=0.100 n=3+3)

name          old max_heap_alloc           new max_heap_alloc           delta
AgentAll-512                    989M ± 3%                   1015M ± 2%      ~     (p=0.400 n=3+3)

name          old max_heap_objects         new max_heap_objects         delta
AgentAll-512                   8.82M ± 5%                   9.00M ± 5%      ~     (p=0.700 n=3+3)

name          old max_rss                  new max_rss                  delta
AgentAll-512                   1.10G ± 2%                   1.13G ± 1%    +2.65%  (p=0.100 n=3+3)

name          old mean_available_indexers  new mean_available_indexers  delta
AgentAll-512                    2.65 ±11%                    0.00       -100.00%  (p=0.100 n=3+3)

name          old alloc/op                 new alloc/op                 delta
AgentAll-512                   638MB ± 0%                   644MB ± 1%      ~     (p=0.200 n=3+3)

name          old allocs/op                new allocs/op                delta
AgentAll-512                   8.59M ± 0%                   8.63M ± 0%      ~     (p=0.200 n=3+3)

The number of 429 was significant (up to 23% of all requests).

I made some small changes (not pushed to this PR) which disallowed scaling when 1% or more of the total indexed documents results in a 429, that seemed to give a very similar performance to main:

$ benchstat -alpha 0.11 sizes/scaling/undersized/8g-2s-2n-main.txt sizes/scaling/undersized/8g-2s-2npr-block-as.txt
name          old time/op                  new time/op                  delta
AgentAll-512                   1.91s ± 4%                   1.92s ± 1%      ~     (p=1.000 n=3+3)

name          old error_responses/sec      new error_responses/sec      delta
AgentAll-512                    0.00                         0.00           ~     (all equal)

name          old events/sec               new events/sec               delta
AgentAll-512                   9.46k ± 4%                   9.39k ± 1%      ~     (p=1.000 n=3+3)

name          old gc_cycles                new gc_cycles                delta
AgentAll-512                     130 ± 8%                     123 ± 5%      ~     (p=0.500 n=3+3)

name          old max_goroutines           new max_goroutines           delta
AgentAll-512                     232 ±10%                     304 ± 6%   +30.85%  (p=0.100 n=3+3)

name          old max_heap_alloc           new max_heap_alloc           delta
AgentAll-512                    989M ± 3%                    974M ± 0%      ~     (p=0.700 n=3+3)

name          old max_heap_objects         new max_heap_objects         delta
AgentAll-512                   8.82M ± 5%                   8.21M ± 3%    -6.96%  (p=0.100 n=3+3)

name          old max_rss                  new max_rss                  delta
AgentAll-512                   1.10G ± 2%                   1.11G ± 0%      ~     (p=0.700 n=3+3)

name          old mean_available_indexers  new mean_available_indexers  delta
AgentAll-512                    2.65 ±11%                    0.00       -100.00%  (p=0.100 n=3+3)

name          old alloc/op                 new alloc/op                 delta
AgentAll-512                   638MB ± 0%                   642MB ± 0%    +0.60%  (p=0.100 n=3+3)

name          old allocs/op                new allocs/op                delta
AgentAll-512                   8.59M ± 0%                   8.62M ± 0%      ~     (p=0.200 n=3+3)

patch:

diff --git a/internal/model/modelindexer/indexer.go b/internal/model/modelindexer/indexer.go
index de50be0a1..e50844c68 100644
--- a/internal/model/modelindexer/indexer.go
+++ b/internal/model/modelindexer/indexer.go
@@ -594,6 +594,14 @@ func (i *Indexer) maybeScaleDown(now time.Time, info scalingInfo, timedFlush *ui
 		}
 		info = i.scalingInformation() // refresh scaling info if CAS failed.
 	}
+	// If more than 1% of the requests result in 429, scale down.
+	if i.indexFailureRate() >= 0.01 {
+		if new := info.ScaleDown(now); i.scalingInfo.CompareAndSwap(info, new) {
+			i.logger.Infof("Elasticsearch 429 too many rate exceeded 1%, scaling down to: %d", new)
+			return true
+		}
+		return false
+	}
 	if *timedFlush < i.config.Scaling.ScaleDown.Threshold {
 		return false
 	}
@@ -624,6 +632,10 @@ func (i *Indexer) maybeScaleUp(now time.Time, info scalingInfo, fullFlush *uint)
 	// Reset fullFlush after it has exceeded the threshold
 	// it avoids unnecessary precociousness to scale up.
 	*fullFlush = 0
+	// If more than 1% of the requests result in 429, do not scale up.
+	if i.indexFailureRate() >= 0.01 {
+		return false
+	}
 	if info.withinCoolDown(i.config.Scaling.ScaleUp.CoolDown, now) {
 		return false
 	}
@@ -642,6 +654,11 @@ func (i *Indexer) scalingInformation() scalingInfo {
 	return i.scalingInfo.Load().(scalingInfo)
 }
 
+func (i *Indexer) indexFailureRate() float64 {
+	return float64(atomic.LoadInt64(&i.tooManyRequests)) /
+		float64(atomic.LoadInt64(&i.eventsAdded))
+}
+
 // activeLimit returns the value of GOMAXPROCS / 4. Which should limit the
 // maximum number of active indexers to 25% of GOMAXPROCS.
 // NOTE: There is also a sweet spot between Config.MaxRequests and the number

If we're ok merging this PR and doing some more investigation on backstop of pressure identifiers as a follow up then we can properly test all these changes and perhaps think about a different way to look for overwhelmed Elasticsearch.

simitt

Apart from a minor comment that hasn't been addressed yet, this LGTM.
Let's wait for @axw 's approval also, and then merge this in and address underscaled ES behavior in a follow up.

dev_docs/ARCHITECTURE.md

Co-authored-by: Silvia Mitter <[email protected]>

axw

If we're ok merging this PR and doing some more investigation on backstop of pressure identifiers as a follow up then we can properly test all these changes and perhaps think about a different way to look for overwhelmed Elasticsearch.

SGTM.

The changes all look good apart from a couple of issues with the metrics.

internal/beater/beater.go

Signed-off-by: Marc Lopez Rubio <[email protected]>

axw

One last thing, I think.

internal/model/modelindexer/indexer.go

Signed-off-by: Marc Lopez Rubio <[email protected]>

axw

Thank you!

axw · 2022-11-28T02:50:21Z

This should be tested as part of #9182

marclop added enhancement backport-skip Skip notification from the automated backport with mergify v8.6.0 labels Oct 18, 2022

marclop marked this pull request as ready for review October 18, 2022 08:22

marclop requested a review from a team October 18, 2022 08:22

marclop added 2 commits October 18, 2022 17:05

Skip counter increments when scaling is disabled

59e60bf

Signed-off-by: Marc Lopez Rubio <[email protected]>

Document modelindexer in dev_docs

987dfed

Signed-off-by: Marc Lopez Rubio <[email protected]>

axw reviewed Oct 19, 2022

View reviewed changes

simitt reviewed Oct 19, 2022

View reviewed changes

Address review comments

c9e4bcf

Signed-off-by: Marc Lopez Rubio <[email protected]>

axw reviewed Oct 20, 2022

View reviewed changes

internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved

internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved

marclop added 4 commits October 20, 2022 16:09

Address more review comments

159aa98

Signed-off-by: Marc Lopez Rubio <[email protected]>

Merge branch 'main' into f/modelindexer-autoscale-active-indexers

79a1e68

Add changelog entry

afcced4

Signed-off-by: Marc Lopez Rubio <[email protected]>

Update variable tests

e395e26

Signed-off-by: Marc Lopez Rubio <[email protected]>

axw reviewed Oct 24, 2022

View reviewed changes

internal/beater/beater.go Outdated Show resolved Hide resolved

internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

marclop added 4 commits October 25, 2022 09:27

Rename bulk_requests to indexers

fd93850

Signed-off-by: Marc Lopez Rubio <[email protected]>

Merge remote-tracking branch 'upstream/main' into f/modelindexer-auto…

e727a3d

…scale-active-indexers

Fix incorrect available bulk requests metric

7437c11

Signed-off-by: Marc Lopez Rubio <[email protected]>

Fix created/destroyed metrics

38bab37

Signed-off-by: Marc Lopez Rubio <[email protected]>

simitt approved these changes Oct 25, 2022

View reviewed changes

dev_docs/ARCHITECTURE.md Outdated Show resolved Hide resolved

Update dev_docs/ARCHITECTURE.md

2a7db43

Co-authored-by: Silvia Mitter <[email protected]>

marclop requested a review from axw October 25, 2022 16:31

axw reviewed Oct 26, 2022

View reviewed changes

internal/beater/beater.go Outdated Show resolved Hide resolved

marclop added 2 commits October 26, 2022 08:25

Correct metric names

db70a87

Signed-off-by: Marc Lopez Rubio <[email protected]>

Merge branch 'main' into f/modelindexer-autoscale-active-indexers

921c8e0

axw requested changes Oct 26, 2022

View reviewed changes

internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved

Rename bulk_requests.active to indexers.active

0ec8a84

Signed-off-by: Marc Lopez Rubio <[email protected]>

marclop requested a review from axw October 26, 2022 10:53

axw approved these changes Oct 26, 2022

View reviewed changes

marclop merged commit 6e75463 into elastic:main Oct 26, 2022

marclop deleted the f/modelindexer-autoscale-active-indexers branch October 26, 2022 13:18

marclop mentioned this pull request Nov 1, 2022

beater: Set semaphore, modelindexer chan from mem #9358

Merged

1 task

cmacknz mentioned this pull request Nov 10, 2022

Explore an autoscaling Elasticsearch output based on the APM Server autoscaling work elastic/elastic-agent-shipper#175

Open

marclop added the test-plan label Nov 14, 2022

axw removed the test-plan label Nov 28, 2022

kruskall assigned kruskall and unassigned kruskall Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modelindexer: Scale active indexers based on load #9393

modelindexer: Scale active indexers based on load #9393

marclop commented Oct 18, 2022 •

edited

Loading

apmmachine commented Oct 18, 2022 •

edited

Loading

Build stats

Test stats 🧪

apmmachine commented Oct 18, 2022 •

edited

Loading

marclop commented Oct 18, 2022

marclop commented Oct 18, 2022

marclop commented Oct 18, 2022

axw left a comment

axw Oct 19, 2022

simitt left a comment

simitt Oct 19, 2022

axw Oct 26, 2022

axw left a comment

This comment was marked as resolved.

marclop commented Oct 25, 2022

simitt left a comment

axw left a comment

axw left a comment

axw left a comment

axw commented Nov 28, 2022

modelindexer: Scale active indexers based on load #9393

modelindexer: Scale active indexers based on load #9393

Conversation

marclop commented Oct 18, 2022 • edited Loading

Motivation/summary

Checklist

How to test these changes

Related issues

apmmachine commented Oct 18, 2022 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

apmmachine commented Oct 18, 2022 • edited Loading

📚 Go benchmark report

marclop commented Oct 18, 2022

marclop commented Oct 18, 2022

Benchstat diff

1 to 4g APM Server

8g APM Server

15g APM Server

30g APM Server

marclop commented Oct 18, 2022

Metric Screenshots

APM Server

Elasticsearch

axw left a comment

Choose a reason for hiding this comment

axw Oct 19, 2022

Choose a reason for hiding this comment

simitt left a comment

Choose a reason for hiding this comment

simitt Oct 19, 2022

Choose a reason for hiding this comment

axw Oct 26, 2022

Choose a reason for hiding this comment

axw left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

marclop commented Oct 25, 2022

simitt left a comment

Choose a reason for hiding this comment

axw left a comment

Choose a reason for hiding this comment

axw left a comment

Choose a reason for hiding this comment

axw left a comment

Choose a reason for hiding this comment

axw commented Nov 28, 2022

marclop commented Oct 18, 2022 •

edited

Loading

apmmachine commented Oct 18, 2022 •

edited

Loading

apmmachine commented Oct 18, 2022 •

edited

Loading