-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the Ability to Disable certain REST APIs via a Cluster Setting #84876
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Wouldn't blocking these APIs cripple the diagnostics tooling? |
Yes but that's to some degree the intended purpose here. E.g. The issue with a recent large deployment that motivated this (though this has been a recurring issue) was that some yet to be identified diagnostics script was hitting I should point out that this feature is not meant as a replacement for improving the stability of these APIs, it's merely to give us a better way of helping large deployments in the short-term and to have some insurance against unforeseen edge cases (e.g. #84266 creating an absurd response size on the ILM explain API and that causing trouble once someone finds their way to the Kibana page that calls the API). |
My concern is that such a crude hammer will allow cause users to break their own thumbs. I can already imagine issues where some API isn't responding as expected, and then finding they left one of these set, or tried blocking more than they should. Does the api have to be general, or could it have a specific list of possible apis that can be blocked? |
That was my thought as well. Other's made a IMO very good point though that this leaves the possibility of missing an edge case or a new API ending up not scaling well unexpectedly. So this seems more useful if it's very flexible.
Maybe, but I think this wouldn't be used by anyone unless they ran into trouble with an API. It's hard to imagine this creating hard to resolve problems (so long as we don't allow blocking the settings update I guess :)) because any error message returned by blocked APIs would be quite obvious and inspecting the cluster settings also makes it trivial to understand the situation we're in with a cluster? |
Unless we can't return the cluster settings. :) As long as we put protections in for reading and writing settings, I guess this is ok. |
Crude but IMO better than the alternative we have today: we can't block stuff even when desperate, despite blocking stuff being the obvious bandage to apply. If it's actually needed then that's a bug, but at least with this we have some chance of keeping things running until we land a fix. I agree that it's vital to avoid confusion about what's going on with a cluster with this kind of block in place, and also I'd like us to make it clear that it's a temporary thing rather than a state we expect clusters to be in when working normally. The response needs to be super-clear, and I suggest we also emit a Also we need to be 110% certain that it's always possible to release the block. Could it be a |
Description
Elasticsearch contains a number of APIs that can produce very large responses when called in clusters containing a huge number of indices/shards. Examples include:
(particularly when called with the
?pretty
option). These large responses can consume resources on the coordinating node(s) that respond to these APIs in ways that are unacceptable in a production cluster. So far when dealing with these issues we had to resort to tracking down the offending caller making the API calls to a cluster or adjusting the authorization setup to disable an API for a caller in order to stabilize a cluster. The first option is very time consuming and might involve making adjustments to a large number of processes calling an API. The second option is complicated and comes with a number of limitations depending on the exact role setup of a deployment.-> we discussed this in the many-shards sync and decided we'd like to add a cluster setting that allows turning off REST APIs by path so that a cluster can be stabilized right away once the offending API has been identified.
I would suggest the cluster setting:
that takes a list of paths exactly like we already have it in the REST request tracer.
relates #77466
The text was updated successfully, but these errors were encountered: