[Elasticsearch Client] Limit Kibana's internal client's maxSockets #151778
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
While testing #151110, we noticed that most of the issues come from a sudden spike in the generation of requests to ES.
While Kibana should aim to scale as needed for any incoming HTTP requests, its background tasks should be limited from branching too many ES connections at once. When this happens, it's usually a sign of non-scalable code running in the background.
This PR wants to test if limiting the number of background connections in the Kibana Internal Elasticsearch Client improves the overall performance experienced by users. The motivation is: limiting the resources for background processes leaves extra capacity to process requests from actual users.
A promising starting point #151110's scalability tests show a big improvement:
FWIW, mission-critical background processes like Alerts can create their own client using
core.elasticsearch.createClient()
to circumvent this limitation if deemed necessary.TODO:
https://buildkite.com/elastic/kibana-apis-capacity-testing/builds/301TBD... waiting for @dmlemeshko to apply some changes to the runner so we can run from a branchInfinity
even for user-triggered requests?Checklist
Delete any items that are not applicable to this PR.
Risk Matrix
maxSockets: Infinity
by default, high event loop delays might block all services. Also, the mission-critical ones can create custom clients to have their dedicated connection pool.For maintainers