Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modelindexer: Scale active indexers based on load #9393

Merged
merged 16 commits into from
Oct 26, 2022

Conversation

marclop
Copy link
Contributor

@marclop marclop commented Oct 18, 2022

Motivation/summary

Modifies the modelindexer to scale up and down the number of "active" indexers (goroutines pulling out of the internal modelindexer queue), based on consecutive number of flushes up to 25% of the GOMAXPROCS.

When N number of full flushes (Reached or exceeded FlushBytes) occur, a new active indexer will be created by the scale action, as long as the scaling cooldown has elapsed.

Equally, when N number of timed flushes (due to FlushInterval) occur, an active indexer will be scaled down, since not enough load is going through the server to warrant the current number of active indexers.

Active indexer downscaling can also be triggered due to a change in the GOMAXPROCS. This is particularly important for containerized or cgroup environments where CPU quotas may be updated "live". In this case, the downscale cooldown is ignored and active indexers are scaled down until until active <= math.RoundToEven(GOMAXPROCS / 4).

When traffic isn't going through an active indexer, a new timer has been introduced to allow completely idle indexers to be scaled back. The idle check interval can be configured via IdleInterval.

Scaling is enabled by default, but can be disabled via the configuration option output.elasticsearch.scaling.enabled: false.

Last, modifies the default settings for max_requests and flush_bytes to be 50 and 1MB respectively. This allows smaller payloads to be sent to Elasticsearch, more available indexers can be used by instances with more processing power, and the indexers are cycled faster, which results in better usage and performance.

Checklist

How to test these changes

Run benchmarks on instances 8g and bigger in ESS and observe CPU utilization reaching ~100%.

Related issues

Closes #9181

Modifies the `modelindexer` to scale up and down the number of "active"
indexers (goroutines pulling out of the internal modelindexer queue),
based on consecutive number of flushes up to 20% of the `GOMAXPROCS`.

When N number of full flushes (Reached or exceeded `FlushBytes`) occur,
a new active indexer will be created by the scale action, as long as the
scaling cooldown has elapsed.

Equally, when N number of timed flushes (due to `FlushInterval`) occur,
an active indexer will be scaled down, since not enough load is going
through the server to warrant the current number of active indexers.

Active indexer downscaling can also be triggered due to a change in the
`GOMAXPROCS`. This is particularly important for containerized or cgroup
environments where CPU quotas may be updated "live". In this case, the
downscale cooldown is ignored and active indexers are scaled down until
until `active <= math.RoundToEven(GOMAXPROCS / 4)`.

When traffic isn't going through an active indexer, a new timer has been
introduced to allow completely idle indexers to be scaled back. The idle
check interval can be configured via `IdleInterval`.

Scaling is enabled by default, but can be disabled via the configuration
option `output.elasticsearch.scaling.disabled: true`.

Last, modifies the default settings for `max_requests` and `flush_bytes`
to be `50` and `1MB` respectively. This allows smaller payloads to be
sent to Elasticsearch, more available indexers can be used by instances
with more processing power, and the indexers are cycled faster, which
results in better usage and performance.

Signed-off-by: Marc Lopez Rubio <[email protected]>
@marclop marclop added enhancement backport-skip Skip notification from the automated backport with mergify v8.6.0 labels Oct 18, 2022
@marclop marclop marked this pull request as ready for review October 18, 2022 08:22
@marclop marclop requested a review from a team October 18, 2022 08:22
@apmmachine
Copy link
Contributor

apmmachine commented Oct 18, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-10-26T10:53:41.601+0000

  • Duration: 27 min 55 sec

Test stats 🧪

Test Results
Failed 0
Passed 154
Skipped 0
Total 154

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@apmmachine
Copy link
Contributor

apmmachine commented Oct 18, 2022

📚 Go benchmark report

Diff with the main branch

name                                                                                              old time/op    new time/op    delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
FetchAndAdd/FetchFromCache-12                                                                       41.2ns ± 1%    46.1ns ± 0%   +11.97%  (p=0.016 n=5+4)
FetchAndAdd/FetchAndAddToCache-12                                                                   92.0ns ± 1%   102.2ns ± 0%   +11.07%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
ContextResetContentEncoding/empty-12                                                                 117ns ± 1%     132ns ± 1%   +12.27%  (p=0.008 n=5+5)
ContextResetContentEncoding/uncompressed-12                                                          151ns ± 0%     170ns ± 1%   +12.08%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessor/invalid-metadata-2.ndjson-12                                                       7.86µs ± 7%    6.98µs ± 8%   -11.15%  (p=0.032 n=5+5)
BackendProcessor/unknown-span-type.ndjson-12                                                        76.2µs ± 8%    60.9µs ±23%   -20.02%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_rum.ndjson-12                    7.39µs ± 3%    7.86µs ± 5%    +6.28%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_transaction_id.ndjson-12         18.5µs ± 9%    21.5µs ± 9%   +15.99%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/otel-bridge.ndjson-12                   10.5µs ± 5%    11.5µs ± 9%    +9.99%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/ratelimit.ndjson-12                     18.0µs ± 4%    19.4µs ± 3%    +8.18%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/invalid-event-type.ndjson-12            1.86µs ± 7%    2.32µs ± 3%   +25.09%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions.ndjson-12                  10.9µs ± 2%    11.1µs ± 1%    +1.66%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans.ndjson-12            10.7µs ± 1%    11.1µs ± 2%    +4.14%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum.ndjson-12        1.97µs ± 1%    2.04µs ± 1%    +3.82%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum_2.ndjson-12      1.91µs ± 1%    1.98µs ± 1%    +3.60%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/unknown-span-type.ndjson-12             7.13µs ± 1%    7.27µs ± 1%    +1.98%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors.ndjson-12                      6.87µs ± 2%    7.35µs ± 2%    +6.94%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event-type.ndjson-12           784ns ± 1%     802ns ± 2%    +2.29%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event.ndjson-12               3.28µs ± 1%    3.33µs ± 1%    +1.59%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-json-event.ndjson-12          1.08µs ± 1%    1.10µs ± 1%    +1.80%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata-2.ndjson-12           480ns ± 2%     488ns ± 1%    +1.82%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata.ndjson-12             484ns ± 1%     495ns ± 1%    +2.23%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/otel-bridge.ndjson-12                 3.13µs ± 2%    3.21µs ± 1%    +2.50%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans_rum.ndjson-12      1.57µs ± 1%    1.60µs ± 2%    +1.88%  (p=0.032 n=5+5)
ReadBatch/invalid-json-metadata.ndjson-12                                                           43.2µs ±11%    32.7µs ±29%   -24.34%  (p=0.032 n=5+5)
ReadBatch/optional-timestamps.ndjson-12                                                             18.3µs ±18%    13.1µs ±13%   -28.18%  (p=0.008 n=5+5)
ReadBatch/spans.ndjson-12                                                                            153µs ± 5%     114µs ±32%   -25.30%  (p=0.008 n=5+5)
ReadBatch/transactions.ndjson-12                                                                     126µs ±15%      98µs ±16%   -21.75%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
AggregateTransaction-12                                                                             84.0ns ± 0%    84.7ns ± 0%    +0.78%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64
WriteTransaction/json_codec-12                                                                      4.25µs ± 6%   12.41µs ± 3%  +191.92%  (p=0.016 n=5+4)
WriteTransaction/json_codec_big_tx-12                                                               4.93µs ± 2%   15.14µs ±43%  +207.00%  (p=0.008 n=5+5)
ReadEvents/json_codec/0_events-12                                                                    319ns ± 8%     352ns ± 7%   +10.28%  (p=0.032 n=5+5)
ReadEvents/json_codec_big_tx/0_events-12                                                             313ns ± 8%     348ns ± 7%   +11.16%  (p=0.032 n=5+5)
ReadEvents/nop_codec/0_events-12                                                                     310ns ± 6%     337ns ± 4%    +8.68%  (p=0.032 n=5+5)
ReadEvents/nop_codec_big_tx/0_events-12                                                              306ns ± 4%     339ns ± 7%   +10.94%  (p=0.008 n=5+5)
IsTraceSampled/sampled-12                                                                           67.8ns ± 1%    74.0ns ± 3%    +9.15%  (p=0.008 n=5+5)
IsTraceSampled/unsampled-12                                                                         68.6ns ± 5%    76.4ns ± 1%   +11.30%  (p=0.008 n=5+5)
IsTraceSampled/unknown-12                                                                            377ns ± 2%     414ns ± 1%    +9.79%  (p=0.008 n=5+5)

name                                                                                              old alloc/op   new alloc/op   delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
ModelIndexer/NoCompression-12                                                                       3.00kB ± 1%    2.95kB ± 0%    -1.58%  (p=0.008 n=5+5)
ModelIndexer/BestSpeed-12                                                                           2.62kB ± 1%    2.59kB ± 1%    -1.07%  (p=0.016 n=5+5)
ModelIndexer/DefaultCompression-12                                                                  2.61kB ± 0%    2.56kB ± 1%    -1.90%  (p=0.008 n=5+5)
ModelIndexer/BestCompression-12                                                                     2.67kB ± 1%    2.58kB ± 1%    -3.03%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessor/invalid-event.ndjson-12                                                            7.70kB ± 1%    7.49kB ± 1%    -2.72%  (p=0.008 n=5+5)
BackendProcessor/invalid-metadata-2.ndjson-12                                                       2.75kB ± 1%    2.70kB ± 1%    -1.82%  (p=0.024 n=5+5)
BackendProcessor/metadata.ndjson-12                                                                 4.98kB ± 2%    4.92kB ± 1%    -1.33%  (p=0.048 n=5+5)
BackendProcessor/optional-timestamps.ndjson-12                                                      4.55kB ± 0%    4.51kB ± 1%    -0.85%  (p=0.000 n=4+5)
BackendProcessor/transactions_spans.ndjson-12                                                       24.3kB ± 2%    23.8kB ± 1%    -1.90%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/errors_rum.ndjson-12                    8.44kB ± 0%    8.66kB ± 2%    +2.57%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/invalid-event-type.ndjson-12            4.06kB ± 1%    4.15kB ± 1%    +2.22%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/optional-timestamps.ndjson-12           5.07kB ± 2%    5.13kB ± 1%    +1.10%  (p=0.024 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/errors_rum.ndjson-12                    8.37kB ± 1%    8.26kB ± 0%    -1.26%  (p=0.016 n=5+4)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum.ndjson-12        6.19kB ± 1%    6.27kB ± 1%    +1.31%  (p=0.024 n=5+5)
ReadBatch/errors.ndjson-12                                                                          20.8kB ± 0%    20.8kB ± 0%    -0.09%  (p=0.016 n=5+5)
ReadBatch/errors_2.ndjson-12                                                                        22.7kB ± 0%    22.7kB ± 0%    -0.08%  (p=0.016 n=5+5)
ReadBatch/optional-timestamps.ndjson-12                                                             3.62kB ± 0%    3.62kB ± 0%    -0.11%  (p=0.008 n=5+5)
ReadBatch/transactions.ndjson-12                                                                    25.2kB ± 0%    25.2kB ± 0%    -0.05%  (p=0.016 n=5+5)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64
WriteTransaction/json_codec-12                                                                      3.00kB ± 0%    3.00kB ± 0%    +0.05%  (p=0.008 n=5+5)
ReadEvents/json_codec_big_tx/100_events-12                                                          1.03MB ± 0%    1.03MB ± 0%    -0.07%  (p=0.016 n=5+4)

name                                                                                              old allocs/op  new allocs/op  delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64

name                                                                                              old speed      new speed      delta
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessor/invalid-metadata-2.ndjson-12                                                     55.6MB/s ± 8%  62.6MB/s ± 7%   +12.54%  (p=0.032 n=5+5)
BackendProcessor/unknown-span-type.ndjson-12                                                      43.5MB/s ± 8%  55.5MB/s ±27%   +27.49%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_rum.ndjson-12                   257MB/s ± 3%   242MB/s ± 5%    -5.88%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/errors_transaction_id.ndjson-12        207MB/s ± 9%   179MB/s ± 9%   -13.53%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/otel-bridge.ndjson-12                  180MB/s ± 5%   164MB/s ± 8%    -8.79%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/ratelimit.ndjson-12                    235MB/s ± 4%   217MB/s ± 2%    -7.59%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/invalid-event-type.ndjson-12           211MB/s ± 8%   168MB/s ± 3%   -20.15%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions.ndjson-12                 518MB/s ± 2%   509MB/s ± 1%    -1.64%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans.ndjson-12           546MB/s ± 1%   524MB/s ± 2%    -3.97%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum.ndjson-12       587MB/s ± 1%   565MB/s ± 1%    -3.70%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/transactions_spans_rum_2.ndjson-12     584MB/s ± 1%   564MB/s ± 1%    -3.49%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/unknown-span-type.ndjson-12            464MB/s ± 1%   455MB/s ± 1%    -1.95%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors.ndjson-12                     924MB/s ± 2%   864MB/s ± 2%    -6.49%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event-type.ndjson-12         498MB/s ± 1%   487MB/s ± 2%    -2.23%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-event.ndjson-12              233MB/s ± 1%   230MB/s ± 1%    -1.56%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-json-event.ndjson-12         545MB/s ± 1%   536MB/s ± 1%    -1.76%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata-2.ndjson-12         909MB/s ± 2%   893MB/s ± 1%    -1.80%  (p=0.016 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata.ndjson-12           921MB/s ± 1%   901MB/s ± 1%    -2.18%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/otel-bridge.ndjson-12                601MB/s ± 2%   586MB/s ± 1%    -2.45%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/transactions_spans_rum.ndjson-12     735MB/s ± 1%   721MB/s ± 2%    -1.86%  (p=0.032 n=5+5)
ReadBatch/invalid-json-metadata.ndjson-12                                                         10.4MB/s ±10%  14.0MB/s ±25%   +35.16%  (p=0.032 n=5+5)
ReadBatch/optional-timestamps.ndjson-12                                                           56.8MB/s ±16%  78.9MB/s ±14%   +39.06%  (p=0.008 n=5+5)
ReadBatch/spans.ndjson-12                                                                         52.5MB/s ± 5%  72.7MB/s ±41%   +38.39%  (p=0.008 n=5+5)
ReadBatch/transactions.ndjson-12                                                                  45.2MB/s ±14%  58.0MB/s ±16%   +28.20%  (p=0.008 n=5+5)

report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

@marclop
Copy link
Contributor Author

marclop commented Oct 18, 2022

Since the benchmark diff is not very readable, here's the output from my laptop:

goos: darwin
goarch: arm64
pkg: github.com/elastic/apm-server/internal/model/modelindexer
BenchmarkModelIndexer/NoCompression-8         	  814345	      1516 ns/op	1651.20 MB/s	    2187 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompression-8         	  682668	      1529 ns/op	1637.96 MB/s	    2245 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompression-8         	  768716	      1477 ns/op	1695.04 MB/s	    2205 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompression-8         	  686592	      1476 ns/op	1696.54 MB/s	    2242 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompression-8         	  798435	      1455 ns/op	1720.88 MB/s	    2193 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  813568	      1455 ns/op	1720.78 MB/s	    2187 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  829038	      1441 ns/op	1737.72 MB/s	    2182 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  811366	      1465 ns/op	1708.66 MB/s	    2188 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  817510	      1439 ns/op	1739.75 MB/s	    2186 B/op	      21 allocs/op
BenchmarkModelIndexer/NoCompressionScaling-8  	  832280	      1471 ns/op	1701.95 MB/s	    2181 B/op	      21 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  519282	      2318 ns/op	1080.32 MB/s	    2564 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  519126	      2283 ns/op	1096.69 MB/s	    2564 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  515020	      2348 ns/op	1065.84 MB/s	    2566 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  500868	      2341 ns/op	1069.58 MB/s	    2564 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeed-8             	  521577	      2277 ns/op	1099.58 MB/s	    2563 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  595076	      1942 ns/op	1289.09 MB/s	    2570 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  614433	      1923 ns/op	1302.01 MB/s	    2568 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  609142	      1936 ns/op	1293.09 MB/s	    2567 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  599374	      1940 ns/op	1290.15 MB/s	    2567 B/op	      24 allocs/op
BenchmarkModelIndexer/BestSpeedScaling-8      	  600386	      1955 ns/op	1280.93 MB/s	    2567 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  426439	      2803 ns/op	 893.28 MB/s	    2533 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  423412	      2854 ns/op	 877.47 MB/s	    2533 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  426790	      2825 ns/op	 886.45 MB/s	    2533 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  424195	      2826 ns/op	 885.72 MB/s	    2532 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompression-8    	  424188	      2830 ns/op	 884.95 MB/s	    2532 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  520196	      2126 ns/op	1178.06 MB/s	    2536 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  534577	      2268 ns/op	1103.97 MB/s	    2537 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  530379	      2240 ns/op	1117.84 MB/s	    2535 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  518343	      2191 ns/op	1143.05 MB/s	    2536 B/op	      24 allocs/op
BenchmarkModelIndexer/DefaultCompressionScaling-8         	  521464	      2196 ns/op	1139.97 MB/s	    2536 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  233355	      4386 ns/op	 570.62 MB/s	    2572 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  235694	      4363 ns/op	 573.90 MB/s	    2568 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  233110	      4362 ns/op	 574.00 MB/s	    2571 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  235569	      4391 ns/op	 570.29 MB/s	    2569 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompression-8                   	  234534	      4430 ns/op	 565.28 MB/s	    2569 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  382075	      3034 ns/op	 825.18 MB/s	    2572 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  384585	      2996 ns/op	 835.71 MB/s	    2570 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  382268	      3032 ns/op	 825.74 MB/s	    2572 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  381604	      3069 ns/op	 815.81 MB/s	    2572 B/op	      24 allocs/op
BenchmarkModelIndexer/BestCompressionScaling-8            	  386227	      3004 ns/op	 833.68 MB/s	    2570 B/op	      24 allocs/op
PASS
ok  	github.com/elastic/apm-server/internal/model/modelindexer	75.821s

And benchstat:

benchstat scaling.txt
name                                      time/op
ModelIndexer/NoCompression-8                1.49µs ± 3%
ModelIndexer/NoCompressionScaling-8         1.45µs ± 1%
ModelIndexer/BestSpeed-8                    2.31µs ± 2%
ModelIndexer/BestSpeedScaling-8             1.94µs ± 1%
ModelIndexer/DefaultCompression-8           2.83µs ± 1%
ModelIndexer/DefaultCompressionScaling-8    2.20µs ± 4%
ModelIndexer/BestCompression-8              4.39µs ± 1%
ModelIndexer/BestCompressionScaling-8       3.03µs ± 1%

name                                      speed
ModelIndexer/NoCompression-8              1.68GB/s ± 3%
ModelIndexer/NoCompressionScaling-8       1.72GB/s ± 1%
ModelIndexer/BestSpeed-8                  1.08GB/s ± 2%
ModelIndexer/BestSpeedScaling-8           1.29GB/s ± 1%
ModelIndexer/DefaultCompression-8          886MB/s ± 1%
ModelIndexer/DefaultCompressionScaling-8  1.14GB/s ± 4%
ModelIndexer/BestCompression-8             571MB/s ± 1%
ModelIndexer/BestCompressionScaling-8      827MB/s ± 1%

@marclop
Copy link
Contributor Author

marclop commented Oct 18, 2022

Benchstat diff

1 to 4g APM Server

All had similar performance to main, since the active indexers were only scaled up briefly when CPU credits were available which to significant bursts in throughput when that occurred.

8g APM Server

This one has less of a noticeable increase, but the CPU was utilized more fully. Bursts had way higher throughput.

$ benchstat -alpha 0.11 sizes/main-active/8g-12s-12n.txt sizes/scaling/8g-24s-24n1mb-50av-gomaxprocs-by-4.txt
name          old time/op                  new time/op                  delta
AgentAll-512                   606ms ± 1%                   593ms ±25%      ~     (p=0.700 n=3+3)

name          old error_responses/sec      new error_responses/sec      delta
AgentAll-512                    0.03 ±58%                    0.00       -100.00%  (p=0.100 n=3+3)

name          old events/sec               new events/sec               delta
AgentAll-512                   29.5k ± 1%                   31.5k ±28%      ~     (p=0.700 n=3+3)

name          old gc_cycles                new gc_cycles                delta
AgentAll-512                     318 ± 1%                     312 ±26%      ~     (p=0.700 n=3+3)

name          old max_goroutines           new max_goroutines           delta
AgentAll-512                     382 ± 2%                     398 ±29%      ~     (p=0.700 n=3+3)

name          old max_heap_alloc           new max_heap_alloc           delta
AgentAll-512                   1.11G ± 2%                   1.15G ± 5%      ~     (p=0.200 n=3+3)

name          old max_heap_objects         new max_heap_objects         delta
AgentAll-512                   10.0M ± 2%                   10.1M ± 6%      ~     (p=1.000 n=3+3)

name          old max_rss                  new max_rss                  delta
AgentAll-512                   1.22G ± 1%                   1.26G ± 5%      ~     (p=0.400 n=3+3)

name          old mean_available_indexers  new mean_available_indexers  delta
AgentAll-512                    16.1 ± 1%                    40.7 ± 8%  +153.35%  (p=0.100 n=3+3)

name          old alloc/op                 new alloc/op                 delta
AgentAll-512                   570MB ± 1%                   581MB ± 1%    +2.03%  (p=0.100 n=3+3)

name          old allocs/op                new allocs/op                delta
AgentAll-512                   7.99M ± 1%                   8.15M ± 1%    +2.09%  (p=0.100 n=3+3)

15g APM Server

From this size onwards is where the autoscaling really shines. CPU utilization was increased and so was throughput.

$ benchstat -alpha 0.11 sizes/main-active/15g-12s-12n.txt sizes/scaling/15g-24s-24n1mb-50av-gomaxprocs-by-4.txt
name          old time/op                  new time/op                  delta
AgentAll-960                   561ms ±10%                   336ms ± 0%   -40.08%  (p=0.100 n=3+3)

name          old error_responses/sec      new error_responses/sec      delta
AgentAll-960                    0.06 ±85%                   0.00 ±200%   -95.79%  (p=0.100 n=3+3)

name          old events/sec               new events/sec               delta
AgentAll-960                   31.8k ±10%                   53.6k ± 0%   +68.34%  (p=0.100 n=3+3)

name          old gc_cycles                new gc_cycles                delta
AgentAll-960                     317 ± 3%                     448 ± 3%   +41.28%  (p=0.100 n=3+3)

name          old max_goroutines           new max_goroutines           delta
AgentAll-960                     386 ± 3%                     562 ± 2%   +45.64%  (p=0.100 n=3+3)

name          old max_heap_alloc           new max_heap_alloc           delta
AgentAll-960                   1.11G ± 0%                   1.19G ± 1%    +7.72%  (p=0.100 n=3+3)

name          old max_heap_objects         new max_heap_objects         delta
AgentAll-960                   9.89M ± 1%                  10.67M ± 0%    +7.91%  (p=0.100 n=3+3)

name          old max_rss                  new max_rss                  delta
AgentAll-960                   1.23G ± 2%                   1.33G ± 1%    +7.84%  (p=0.100 n=3+3)

name          old mean_available_indexers  new mean_available_indexers  delta
AgentAll-960                    15.5 ± 7%                    33.4 ± 0%  +115.79%  (p=0.100 n=3+3)

name          old alloc/op                 new alloc/op                 delta
AgentAll-960                   564MB ± 1%                   571MB ± 0%    +1.20%  (p=0.100 n=3+3)

name          old allocs/op                new allocs/op                delta
AgentAll-960                   7.92M ± 1%                   8.07M ± 0%    +1.82%  (p=0.100 n=3+3)

30g APM Server

$ benchstat -alpha 0.11 sizes/main-active/30g-12s-12n.txt sizes/scaling/30g-24s-24n1mb-50av-gomaxprocs-by-4-pr.txt
name           old time/op                  new time/op                  delta
AgentAll-1920                   585ms ± 5%                   183ms ± 0%   -68.71%  (p=0.100 n=3+3)

name           old error_responses/sec      new error_responses/sec      delta
AgentAll-1920                    0.04 ±18%                    0.01 ±50%   -76.61%  (p=0.100 n=3+3)

name           old events/sec               new events/sec               delta
AgentAll-1920                   30.7k ± 5%                   98.1k ± 0%  +219.84%  (p=0.100 n=3+3)

name           old gc_cycles                new gc_cycles                delta
AgentAll-1920                     335 ± 5%                     772 ± 1%  +130.55%  (p=0.100 n=3+3)

name           old max_goroutines           new max_goroutines           delta
AgentAll-1920                     390 ± 2%                    1026 ± 1%  +163.22%  (p=0.100 n=3+3)

name           old max_heap_alloc           new max_heap_alloc           delta
AgentAll-1920                   1.09G ± 2%                   1.41G ± 1%   +29.59%  (p=0.100 n=3+3)

name           old max_heap_objects         new max_heap_objects         delta
AgentAll-1920                   9.18M ± 3%                  13.15M ± 1%   +43.31%  (p=0.100 n=3+3)

name           old max_rss                  new max_rss                  delta
AgentAll-1920                   1.23G ± 2%                   1.58G ± 0%   +28.20%  (p=0.100 n=3+3)

name           old mean_available_indexers  new mean_available_indexers  delta
AgentAll-1920                    15.8 ± 3%                    12.2 ± 7%   -22.64%  (p=0.100 n=3+3)

name           old alloc/op                 new alloc/op                 delta
AgentAll-1920                   573MB ± 0%                   560MB ± 0%    -2.19%  (p=0.100 n=3+3)

name           old allocs/op                new allocs/op                delta
AgentAll-1920                   8.00M ± 0%                   7.96M ± 0%    -0.59%  (p=0.100 n=3+3)

@marclop
Copy link
Contributor Author

marclop commented Oct 18, 2022

Metric Screenshots

The last 3 distributions are all for the same 30g APM Server instance. I ran the benchmarks 3 times with that size to gather some more data.

APM Server

Screen Shot 2022-10-18 at 15 57 11

Elasticsearch

464GB of RAM! Cluster over 3 zones. With a total of 24 hot nodes.

Screen Shot 2022-10-18 at 16 06 53

Screen Shot 2022-10-18 at 16 07 23

Screen Shot 2022-10-18 at 16 08 24

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of comments and suggestions, but this is very nice :)

dev_docs/ARCHITECTURE.md Outdated Show resolved Hide resolved
dev_docs/ARCHITECTURE.md Outdated Show resolved Hide resolved
internal/beater/beater.go Outdated Show resolved Hide resolved
internal/beater/beater.go Outdated Show resolved Hide resolved
internal/model/modelindexer/indexer.go Show resolved Hide resolved
Comment on lines 488 to 496
// When the queue utilization is below 5%, reset the idleTimer. When
// traffic to the APM Server is interrupted or stopped, it allows excess
// active indexers that have been idle for a the IdleInterval to be
// scaled down.
activeIndexers := atomic.LoadInt64(&i.activeBulkRequests)
lowChanCapacity := float64(len(i.bulkItems))/float64(cap(i.bulkItems)) <= 0.05
if lowChanCapacity && activeIndexers > 1 {
idleTimer.Reset(i.config.Scaling.IdleInterval)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this on every iteration of the loop concerns me a little bit. Would it make sense to require FlushInterval >= IdleInterval, and only reset the idle timer before the first iteration of the loop, and then after flushing? i.e. whenever the flush timer is inactive.

Also, I think the idle timer should only be started when autoscaling is enabled?

internal/model/modelindexer/indexer_test.go Outdated Show resolved Hide resolved
internal/model/modelindexer/indexer_test.go Outdated Show resolved Hide resolved
internal/model/modelindexer/indexer_test.go Outdated Show resolved Hide resolved
internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved
Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited about these changes!
Left a couple of mostly nitpicks around naming and comments.

Have you tested this against an undersized ES? Given that the number of active indexers is based on events processed in the APM Server and not also on ES response and pushback, I wonder if a situation might get worse for setups where APM Server is processing more events than ES can handle, by increasing the pressure towards ES.

dev_docs/ARCHITECTURE.md Outdated Show resolved Hide resolved
dev_docs/ARCHITECTURE.md Outdated Show resolved Hide resolved
dev_docs/ARCHITECTURE.md Outdated Show resolved Hide resolved
internal/beater/beater.go Outdated Show resolved Hide resolved
dev_docs/ARCHITECTURE.md Outdated Show resolved Hide resolved
// Disabled toggles active indexer scaling on.
//
// It is enabled by default.
Disabled bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd switch to Enabled for consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think this is okay for the moment, maybe we can come back to it if it's a pain. The zero value for ScalingConfig currently means to use the default config; it should be enabled by default. So we would need to make it a *bool to preserve that while also making it "enabled".

internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved
internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved
Signed-off-by: Marc Lopez Rubio <[email protected]>
internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved
internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, it's looking pretty good now. Only one more comment on the metric names from me - I think the main remaining question is @simitt's one about behaviour when ES is underpowered.

internal/beater/beater.go Outdated Show resolved Hide resolved
internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved
@mergify

This comment was marked as resolved.

@marclop
Copy link
Contributor Author

marclop commented Oct 25, 2022

@simitt @axw Thanks for the reviews. I have tested the performance of an 8GB APM Server (4vCPUs with 8 as the burstable max) and it does seem like the pressure is a bit worse since the autoscaling does take place after 60 consecutive flushes. The backing Elasticsearch cluster is a 3 zone 8GB cluster.

Comparing the current PR with main (main has 25 available indexers of 2.5MB of size):

$ benchstat -alpha 0.11 sizes/scaling/undersized/8g-2s-2n-main.txt sizes/scaling/undersized/8g-2s-2n-pr.txt
name          old time/op                  new time/op                  delta
AgentAll-512                   1.91s ± 4%                   2.01s ± 3%      ~     (p=0.200 n=3+3)

name          old error_responses/sec      new error_responses/sec      delta
AgentAll-512                    0.00                         0.00           ~     (all equal)

name          old events/sec               new events/sec               delta
AgentAll-512                   9.46k ± 4%                   8.98k ± 3%      ~     (p=0.200 n=3+3)

name          old gc_cycles                new gc_cycles                delta
AgentAll-512                     130 ± 8%                     120 ± 2%    -7.93%  (p=0.100 n=3+3)

name          old max_goroutines           new max_goroutines           delta
AgentAll-512                     232 ±10%                     294 ± 4%   +26.69%  (p=0.100 n=3+3)

name          old max_heap_alloc           new max_heap_alloc           delta
AgentAll-512                    989M ± 3%                   1015M ± 2%      ~     (p=0.400 n=3+3)

name          old max_heap_objects         new max_heap_objects         delta
AgentAll-512                   8.82M ± 5%                   9.00M ± 5%      ~     (p=0.700 n=3+3)

name          old max_rss                  new max_rss                  delta
AgentAll-512                   1.10G ± 2%                   1.13G ± 1%    +2.65%  (p=0.100 n=3+3)

name          old mean_available_indexers  new mean_available_indexers  delta
AgentAll-512                    2.65 ±11%                    0.00       -100.00%  (p=0.100 n=3+3)

name          old alloc/op                 new alloc/op                 delta
AgentAll-512                   638MB ± 0%                   644MB ± 1%      ~     (p=0.200 n=3+3)

name          old allocs/op                new allocs/op                delta
AgentAll-512                   8.59M ± 0%                   8.63M ± 0%      ~     (p=0.200 n=3+3)

The number of 429 was significant (up to 23% of all requests).

I made some small changes (not pushed to this PR) which disallowed scaling when 1% or more of the total indexed documents results in a 429, that seemed to give a very similar performance to main:

$ benchstat -alpha 0.11 sizes/scaling/undersized/8g-2s-2n-main.txt sizes/scaling/undersized/8g-2s-2npr-block-as.txt
name          old time/op                  new time/op                  delta
AgentAll-512                   1.91s ± 4%                   1.92s ± 1%      ~     (p=1.000 n=3+3)

name          old error_responses/sec      new error_responses/sec      delta
AgentAll-512                    0.00                         0.00           ~     (all equal)

name          old events/sec               new events/sec               delta
AgentAll-512                   9.46k ± 4%                   9.39k ± 1%      ~     (p=1.000 n=3+3)

name          old gc_cycles                new gc_cycles                delta
AgentAll-512                     130 ± 8%                     123 ± 5%      ~     (p=0.500 n=3+3)

name          old max_goroutines           new max_goroutines           delta
AgentAll-512                     232 ±10%                     304 ± 6%   +30.85%  (p=0.100 n=3+3)

name          old max_heap_alloc           new max_heap_alloc           delta
AgentAll-512                    989M ± 3%                    974M ± 0%      ~     (p=0.700 n=3+3)

name          old max_heap_objects         new max_heap_objects         delta
AgentAll-512                   8.82M ± 5%                   8.21M ± 3%    -6.96%  (p=0.100 n=3+3)

name          old max_rss                  new max_rss                  delta
AgentAll-512                   1.10G ± 2%                   1.11G ± 0%      ~     (p=0.700 n=3+3)

name          old mean_available_indexers  new mean_available_indexers  delta
AgentAll-512                    2.65 ±11%                    0.00       -100.00%  (p=0.100 n=3+3)

name          old alloc/op                 new alloc/op                 delta
AgentAll-512                   638MB ± 0%                   642MB ± 0%    +0.60%  (p=0.100 n=3+3)

name          old allocs/op                new allocs/op                delta
AgentAll-512                   8.59M ± 0%                   8.62M ± 0%      ~     (p=0.200 n=3+3)

patch:

diff --git a/internal/model/modelindexer/indexer.go b/internal/model/modelindexer/indexer.go
index de50be0a1..e50844c68 100644
--- a/internal/model/modelindexer/indexer.go
+++ b/internal/model/modelindexer/indexer.go
@@ -594,6 +594,14 @@ func (i *Indexer) maybeScaleDown(now time.Time, info scalingInfo, timedFlush *ui
 		}
 		info = i.scalingInformation() // refresh scaling info if CAS failed.
 	}
+	// If more than 1% of the requests result in 429, scale down.
+	if i.indexFailureRate() >= 0.01 {
+		if new := info.ScaleDown(now); i.scalingInfo.CompareAndSwap(info, new) {
+			i.logger.Infof("Elasticsearch 429 too many rate exceeded 1%, scaling down to: %d", new)
+			return true
+		}
+		return false
+	}
 	if *timedFlush < i.config.Scaling.ScaleDown.Threshold {
 		return false
 	}
@@ -624,6 +632,10 @@ func (i *Indexer) maybeScaleUp(now time.Time, info scalingInfo, fullFlush *uint)
 	// Reset fullFlush after it has exceeded the threshold
 	// it avoids unnecessary precociousness to scale up.
 	*fullFlush = 0
+	// If more than 1% of the requests result in 429, do not scale up.
+	if i.indexFailureRate() >= 0.01 {
+		return false
+	}
 	if info.withinCoolDown(i.config.Scaling.ScaleUp.CoolDown, now) {
 		return false
 	}
@@ -642,6 +654,11 @@ func (i *Indexer) scalingInformation() scalingInfo {
 	return i.scalingInfo.Load().(scalingInfo)
 }
 
+func (i *Indexer) indexFailureRate() float64 {
+	return float64(atomic.LoadInt64(&i.tooManyRequests)) /
+		float64(atomic.LoadInt64(&i.eventsAdded))
+}
+
 // activeLimit returns the value of GOMAXPROCS / 4. Which should limit the
 // maximum number of active indexers to 25% of GOMAXPROCS.
 // NOTE: There is also a sweet spot between Config.MaxRequests and the number

If we're ok merging this PR and doing some more investigation on backstop of pressure identifiers as a follow up then we can properly test all these changes and perhaps think about a different way to look for overwhelmed Elasticsearch.

Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from a minor comment that hasn't been addressed yet, this LGTM.
Let's wait for @axw 's approval also, and then merge this in and address underscaled ES behavior in a follow up.

dev_docs/ARCHITECTURE.md Outdated Show resolved Hide resolved
@marclop marclop requested a review from axw October 25, 2022 16:31
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're ok merging this PR and doing some more investigation on backstop of pressure identifiers as a follow up then we can properly test all these changes and perhaps think about a different way to look for overwhelmed Elasticsearch.

SGTM.

The changes all look good apart from a couple of issues with the metrics.

internal/beater/beater.go Outdated Show resolved Hide resolved
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing, I think.

internal/model/modelindexer/indexer.go Outdated Show resolved Hide resolved
@marclop marclop requested a review from axw October 26, 2022 10:53
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@axw
Copy link
Member

axw commented Nov 28, 2022

This should be tested as part of #9182

@axw axw removed the test-plan label Nov 28, 2022
@kruskall kruskall assigned kruskall and unassigned kruskall Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autoscale number of modelindexers to increase throughput and ensure full resource usage
5 participants