-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modelindexer: disable scale up when 429 > 1% #9463
modelindexer: disable scale up when 429 > 1% #9463
Conversation
📚 Go benchmark reportDiff with the
report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat |
Disables the scale up actions when the 429 response rate exceeds 1% of the total response rate. Additionally, scale down respecting the scale down parameters when the rate is breached. Signed-off-by: Marc Lopez Rubio <[email protected]>
b6e9004
to
abf0a77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Did you consider scaling down based on any failures, rather than just 429s?
@axw I did, however, "other" failures could be due to malformed documents (bad mappings, values, etc), so I decided not to for the time being. The other status codes we may consider is looking for 499 (client timeouts?), 502 and 503? That however, we can do on a follow up PR, perhaps since I haven't tested it, and we're not collecting those already. |
Verified with 8.6.0-BC5, running on a GCP VM. I pointed it at an ESS cluster's Elasticsearch; scaled up ES and waited for APM Server to scale up the indexers; then scaled down ES an observed APM Server scale down too. |
Motivation/summary
Disables the scale up actions when the 429 response rate exceeds 1% of the total response rate. Additionally, scale down respecting the scale down parameters when the rate is breached.
Checklist
- [ ] Update package changelog.yml (only if changes toapmpackage
have been made)- [ ] Documentation has been updatedHow to test these changes
Run benchmarks with >= APM Server 8GB against a small ES (8gb) for example.
Related issues
Part of #9181