-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HTTP response code for cluster health API #70849
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
cc @elastic/clients-team 👋 |
We (@elastic/es-distributed) discussed this in our team sync yesterday. HTTP lacks a usefully descriptive response code for this case (see also #35582). 408 is definitely wrong, but 503 is also inappropriate since the service was available and able to handle the request. We decided to move to 200 since the request was handled as desired, and the response body contains valid information even if the requested state was not reached within the time limit. This change will bring this endpoint in line with bulk indexing and other stats and monitoring endpoints that return 200 if the coordination of the request was successful even if it there were some subsidiary failures during processing. Future clients will check the |
…h` (elastic#78180) Backports elastic#78180 to 7.x. Add a deprecation warning and a system property es.cluster_health.request_timeout_200 to opt in for returning 200 which will be the default in 8.0.0 Fixes elastic#70849
…new cluster health response code Backport elastic#79351 to 7.x: The original change was implemented in elastic#78940, but we have decided to move from a system property to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code. Relates elastic#70849
…new cluster health response code (elastic#79397) Backport elastic#79351 to 7.x: The original change was implemented in elastic#78940, but we have decided to move from a system property to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code. Relates elastic#70849
Returning 408 for a cluster health timeout was deprecated in elastic#78180 and backported to 7.x in elastic#78940 Now we can do a breaking change in 8.0 respecting the user choice to run ES in 7.x compatible mode via the REST Compatibility layer. Fixes elastic#70849
I'm recently facing this error again
|
Looks like the fix was reverted, and as of 8.2.2, 408 is still returned: > curl -D - -k -u elastic:changeme "https://localhost:9200/_cluster/health?wait_for_nodes=2&timeout=1ms&pretty"
HTTP/1.1 408 Request Timeout
X-elastic-product: Elasticsearch
content-type: application/json
content-length: 465
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : true,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 2,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
} Elasticsearch version: > curl -k -u elastic:changeme "https://localhost:9200"
{
"name" : "eli.local",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "iYa9R7SbRmqbW1bwWGs1aA",
"version" : {
"number" : "8.2.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "9876968ef3c745186b94fdabd4483e01499224ef",
"build_date" : "2022-05-25T15:47:06.259735307Z",
"build_snapshot" : false,
"lucene_version" : "9.1.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
} What happened? |
Ah we forgot to reopen this, sorry. |
Today
GET _cluster/health
returns408 Request timeout
if it times out before the desired condition is reached. This response code is to indicate that the server timed out waiting for a request from the client, so it isn't appropriate here at all. There's no great fit for a server-side timeout response code, I suggest503
.This API already returns
503 Service unavailable
if there is no master, after waiting a while, but immediately returns200 OK
if there is a master butSTATE_NOT_RECOVERED_BLOCK
is in place. I think we should wait and return503
in the presence of theSTATE_NOT_RECOVERED_BLOCK
too.The text was updated successfully, but these errors were encountered: