Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elasticsearch Client] Limit Kibana's internal client's maxSockets #151778

Closed

Conversation

afharo
Copy link
Member

@afharo afharo commented Feb 21, 2023

Summary

While testing #151110, we noticed that most of the issues come from a sudden spike in the generation of requests to ES.

While Kibana should aim to scale as needed for any incoming HTTP requests, its background tasks should be limited from branching too many ES connections at once. When this happens, it's usually a sign of non-scalable code running in the background.

This PR wants to test if limiting the number of background connections in the Kibana Internal Elasticsearch Client improves the overall performance experienced by users. The motivation is: limiting the resources for background processes leaves extra capacity to process requests from actual users.

A promising starting point #151110's scalability tests show a big improvement:

This PR Baseline
image image

FWIW, mission-critical background processes like Alerts can create their own client using core.elasticsearch.createClient() to circumvent this limitation if deemed necessary.

TODO:

  • Test other scenarios apart from telemetry endpoints: https://buildkite.com/elastic/kibana-apis-capacity-testing/builds/301 TBD... waiting for @dmlemeshko to apply some changes to the runner so we can run from a branch
  • Any improvements found in Single-User benchmarks? => Jobs: Roughly the same results.
  • If agreed on this... document this behavior and scape-hatch for mission-critical apps.
  • If we agree that creating too many sockets it's harmful to Kibana... should we revisit our default of Infinity even for user-triggered requests?

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Risk Probability Severity Mitigation/Notes
Some background processes may consume other mission-critical ones High High It is happening today: while we set maxSockets: Infinity by default, high event loop delays might block all services. Also, the mission-critical ones can create custom clients to have their dedicated connection pool.

For maintainers

@afharo afharo added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc performance intent-discuss labels Feb 21, 2023
@afharo afharo force-pushed the es-limit-internal-client-max-sockets branch from 8f92706 to 0b8a573 Compare February 21, 2023 23:34
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

  • 💔 Build #109407 failed 8f9270666e19c3194b83a74329118450b82a33fa

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@afharo
Copy link
Member Author

afharo commented Feb 22, 2023

After discussing this internally, we all agreed we'd better find a proper default to our current maxSockets value instead of adding a new one. Closing this PR in favour of #151911

@afharo afharo closed this Feb 22, 2023
@afharo afharo deleted the es-limit-internal-client-max-sockets branch February 22, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss performance Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants