Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modelindexer: disable scale up when 429 > 1% #9463

Conversation

marclop
Copy link
Contributor

@marclop marclop commented Oct 28, 2022

Motivation/summary

Disables the scale up actions when the 429 response rate exceeds 1% of the total response rate. Additionally, scale down respecting the scale down parameters when the rate is breached.

Checklist

How to test these changes

Run benchmarks with >= APM Server 8GB against a small ES (8gb) for example.

Related issues

Part of #9181

@marclop marclop added enhancement backport-skip Skip notification from the automated backport with mergify v8.6.0 labels Oct 28, 2022
@apmmachine
Copy link
Contributor

apmmachine commented Oct 28, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-11-02T07:45:58.388+0000

  • Duration: 27 min 28 sec

Test stats 🧪

Test Results
Failed 0
Passed 153
Skipped 0
Total 153

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@apmmachine
Copy link
Contributor

apmmachine commented Oct 28, 2022

📚 Go benchmark report

Diff with the main branch

name                                                                                              old time/op    new time/op    delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
ContextReset/Remote_Addr_ipv6-12                                                                     771ns ±17%     882ns ±10%  +14.49%  (p=0.032 n=5+5)
ContextReset/Forwarded_ipv4-12                                                                       695ns ±49%     945ns ±16%  +35.90%  (p=0.032 n=5+5)
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
RUMV3Processor/rum_errors.ndjson-12                                                                 8.00µs ±36%    9.60µs ±10%  +20.02%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/unknown-span-type.ndjson-12             20.8µs ±18%    25.3µs ±19%  +21.62%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/optional-timestamps.ndjson-12           3.14µs ± 4%    3.40µs ± 6%   +8.36%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/ratelimit.ndjson-12                     10.6µs ± 3%    11.6µs ± 6%   +9.55%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/spans.ndjson-12                         10.8µs ± 1%    10.9µs ± 1%   +0.85%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors_2.ndjson-12                    6.50µs ± 4%    6.78µs ± 5%   +4.22%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata-2.ndjson-12           463ns ± 1%     482ns ± 1%   +4.10%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata.ndjson-12             468ns ± 2%     490ns ± 2%   +4.63%  (p=0.008 n=5+5)
ReadBatch/errors_rum.ndjson-12                                                                      23.4µs ±36%    33.3µs ± 8%  +42.46%  (p=0.008 n=5+5)
ReadBatch/heavy.ndjson-12                                                                           3.66ms ±18%    4.12ms ± 4%  +12.35%  (p=0.032 n=5+4)
ReadBatch/invalid-event.ndjson-12                                                                   34.7µs ±10%    26.4µs ±30%  -24.05%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
TraceGroups-12                                                                                       122ns ± 2%     144ns ± 0%  +18.15%  (p=0.029 n=4+4)
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64

name                                                                                              old alloc/op   new alloc/op   delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/invalid-event-type.ndjson-12            4.14kB ± 1%    4.22kB ± 1%   +1.86%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/invalid-metadata-2.ndjson-12            3.19kB ± 2%    3.13kB ± 1%   -2.02%  (p=0.008 n=5+5)
ReadBatch/invalid-event.ndjson-12                                                                   6.71kB ± 1%    6.68kB ± 0%   -0.40%  (p=0.040 n=5+5)
ReadBatch/unknown-span-type.ndjson-12                                                               16.8kB ± 0%    16.8kB ± 0%   +0.08%  (p=0.008 n=5+5)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64

name                                                                                              old allocs/op  new allocs/op  delta
pkg:github.com/elastic/apm-server/internal/agentcfg goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/beater/request goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
BackendProcessor/heavy.ndjson-12                                                                     22.3k ± 0%     22.3k ± 0%   +0.00%  (p=0.029 n=4+4)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/heavy.ndjson-12                        22.3k ± 0%     22.3k ± 0%   +0.01%  (p=0.029 n=4+4)
pkg:github.com/elastic/apm-server/internal/publish goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage goos:linux goarch:amd64

name                                                                                              old speed      new speed      delta
pkg:github.com/elastic/apm-server/internal/model/modelindexer goos:linux goarch:amd64
pkg:github.com/elastic/apm-server/internal/processor/stream goos:linux goarch:amd64
RUMV3Processor/rum_errors.ndjson-12                                                                125MB/s ±49%   100MB/s ± 9%  -20.13%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel2/unknown-span-type.ndjson-12            160MB/s ±16%   132MB/s ±17%  -17.63%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/optional-timestamps.ndjson-12          327MB/s ± 4%   302MB/s ± 7%   -7.67%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel4/ratelimit.ndjson-12                    399MB/s ± 3%   365MB/s ± 6%   -8.58%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel8/spans.ndjson-12                        742MB/s ± 1%   736MB/s ± 1%   -0.84%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/errors_2.ndjson-12                   725MB/s ± 4%   696MB/s ± 5%   -4.04%  (p=0.032 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata-2.ndjson-12         941MB/s ± 1%   904MB/s ± 1%   -3.94%  (p=0.008 n=5+5)
BackendProcessorParallel/BenchmarkBackendProcessorParallel200/invalid-metadata.ndjson-12           953MB/s ± 2%   911MB/s ± 2%   -4.43%  (p=0.008 n=5+5)
ReadBatch/errors_rum.ndjson-12                                                                    87.3MB/s ±46%  57.1MB/s ± 9%  -34.53%  (p=0.008 n=5+5)
ReadBatch/heavy.ndjson-12                                                                          110MB/s ±21%    97MB/s ± 4%  -11.89%  (p=0.032 n=5+4)
ReadBatch/invalid-event.ndjson-12                                                                 22.2MB/s ± 9%  30.1MB/s ±38%  +35.64%  (p=0.008 n=5+5)
ReadBatch/transactions_spans_rum.ndjson-12                                                        67.0MB/s ±13%  50.7MB/s ± 4%  -24.33%  (p=0.016 n=5+4)

report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

Disables the scale up actions when the 429 response rate exceeds 1% of
the total response rate. Additionally, scale down respecting the scale
down parameters when the rate is breached.

Signed-off-by: Marc Lopez Rubio <[email protected]>
@marclop marclop force-pushed the f/do-not-scale-up-if-tooMany-request-rate-over-percentage branch from b6e9004 to abf0a77 Compare October 31, 2022 12:59
@marclop marclop marked this pull request as ready for review November 2, 2022 07:45
@marclop marclop requested a review from a team November 2, 2022 07:46
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Did you consider scaling down based on any failures, rather than just 429s?

@marclop
Copy link
Contributor Author

marclop commented Nov 2, 2022

@axw I did, however, "other" failures could be due to malformed documents (bad mappings, values, etc), so I decided not to for the time being.

The other status codes we may consider is looking for 499 (client timeouts?), 502 and 503? That however, we can do on a follow up PR, perhaps since I haven't tested it, and we're not collecting those already.

@marclop marclop merged commit 89c17ff into elastic:main Nov 2, 2022
@marclop marclop deleted the f/do-not-scale-up-if-tooMany-request-rate-over-percentage branch November 2, 2022 09:27
@axw axw self-assigned this Dec 2, 2022
@axw
Copy link
Member

axw commented Dec 5, 2022

Verified with 8.6.0-BC5, running on a GCP VM. I pointed it at an ESS cluster's Elasticsearch; scaled up ES and waited for APM Server to scale up the indexers; then scaled down ES an observed APM Server scale down too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement test-plan test-plan-ok v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants