Fix HTTP response code for cluster health API #70849

DaveCTurner · 2021-03-25T08:15:06Z

Today GET _cluster/health returns 408 Request timeout if it times out before the desired condition is reached. This response code is to indicate that the server timed out waiting for a request from the client, so it isn't appropriate here at all. There's no great fit for a server-side timeout response code, I suggest 503.

This API already returns 503 Service unavailable if there is no master, after waiting a while, but immediately returns 200 OK if there is a master but STATE_NOT_RECOVERED_BLOCK is in place. I think we should wait and return 503 in the presence of the STATE_NOT_RECOVERED_BLOCK too.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-03-25T08:15:09Z

Pinging @elastic/es-distributed (Team:Distributed)

Mpdreamz · 2021-04-01T07:01:35Z

cc @elastic/clients-team 👋

DaveCTurner · 2021-04-01T07:10:53Z

We (@elastic/es-distributed) discussed this in our team sync yesterday. HTTP lacks a usefully descriptive response code for this case (see also #35582). 408 is definitely wrong, but 503 is also inappropriate since the service was available and able to handle the request. We decided to move to 200 since the request was handled as desired, and the response body contains valid information even if the requested state was not reached within the time limit. This change will bring this endpoint in line with bulk indexing and other stats and monitoring endpoints that return 200 if the coordination of the request was successful even if it there were some subsidiary failures during processing.

Future clients will check the timed_out field of the response to determine whether to retry or not. Recognising that today's clients may be relying on the HTTP response code we'll treat this bugfix as a breaking change with the usual deprecation warnings etc. We will also introduce a system property that lets users opt into the future behaviour before it becomes mandatory.

Fixes elastic#70849

…8180) Add a deprecation warning and a system property es.cluster_health.request_timeout_200 to opt in for returning 200 which will be the default in 8.0.0 Fixes #70849

…h` (elastic#78180) Backports elastic#78180 to 7.x. Add a deprecation warning and a system property es.cluster_health.request_timeout_200 to opt in for returning 200 which will be the default in 8.0.0 Fixes elastic#70849

…h` (#78180) (#78940) Backports #78180 to 7.x. Add a deprecation warning and a system property es.cluster_health.request_timeout_200 to opt in for returning 200 which will be the default in 8.0.0 Fixes #70849

…uster health response code (#79351) The original change was implemented in #78940, bu we have decided to move from a system property to an a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code. Relates #70849

…new cluster health response code Backport elastic#79351 to 7.x: The original change was implemented in elastic#78940, but we have decided to move from a system property to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code. Relates elastic#70849

…new cluster health response code (#79397) Backport #79351 to 7.x: The original change was implemented in #78940, but we have decided to move from a system property to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code. Relates #70849

…new cluster health response code (elastic#79397) Backport elastic#79351 to 7.x: The original change was implemented in elastic#78940, but we have decided to move from a system property to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code. Relates elastic#70849

…new cluster health response code (#79397) (#79435) Backport #79351 to 7.x: The original change was implemented in #78940, but we have decided to move from a system property to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code. Relates #70849

Returning 408 for a cluster health timeout was deprecated in #78180 and backported to 7.x in #78940 Now we can do a breaking change in 8.0 respecting the user choice to run ES in 7.x compatible mode via the REST Compatibility layer. Fixes #70849

Returning 408 for a cluster health timeout was deprecated in elastic#78180 and backported to 7.x in elastic#78940 Now we can do a breaking change in 8.0 respecting the user choice to run ES in 7.x compatible mode via the REST Compatibility layer. Fixes elastic#70849

…0464) Returning 408 for a cluster health timeout was deprecated in #78180 and backported to 7.x in #78940 Now we can do a breaking change in 8.0 respecting the user choice to run ES in 7.x compatible mode via the REST Compatibility layer. Fixes #70849

loretoparisi · 2022-03-15T15:32:01Z

I'm recently facing this error again

error {
  "status": 408,
  "displayName": "RequestTimeout",
  "message": "Request Timeout after 30000ms"
}

j-bennet · 2022-06-02T00:09:35Z

Looks like the fix was reverted, and as of 8.2.2, 408 is still returned:

> curl -D - -k -u elastic:changeme "https://localhost:9200/_cluster/health?wait_for_nodes=2&timeout=1ms&pretty"
HTTP/1.1 408 Request Timeout
X-elastic-product: Elasticsearch
content-type: application/json
content-length: 465

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : true,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 2,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Elasticsearch version:

> curl -k -u elastic:changeme "https://localhost:9200"
{
  "name" : "eli.local",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "iYa9R7SbRmqbW1bwWGs1aA",
  "version" : {
    "number" : "8.2.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "9876968ef3c745186b94fdabd4483e01499224ef",
    "build_date" : "2022-05-25T15:47:06.259735307Z",
    "build_snapshot" : false,
    "lucene_version" : "9.1.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

What happened?

DaveCTurner · 2022-06-02T09:07:54Z

Ah we forgot to reopen this, sorry.

DaveCTurner added >bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. team-discuss labels Mar 25, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 25, 2021

DaveCTurner mentioned this issue Apr 1, 2021

Server-side timeout returns 408 Request Timeout #35582

Closed

DaveCTurner removed the team-discuss label Apr 1, 2021

barkbay mentioned this issue May 25, 2021

GetClusterHealthWaitForAllEvents may return empty/default value elastic/cloud-on-k8s#4507

Closed

arteam mentioned this issue Sep 22, 2021

Deprecate returning 408 for a server timeout on _cluster/health #78180

Merged

arteam self-assigned this Sep 22, 2021

arteam added a commit to arteam/elasticsearch that referenced this issue Sep 23, 2021

Return correct HTTP code for a server timeout on _cluster/health

817c5d6

Fixes elastic#70849

arteam closed this as completed in #78180 Oct 11, 2021

arteam mentioned this issue Oct 11, 2021

[7.x] Deprecate returning 408 for a server timeout on _cluster/health (#78180) #78940

Merged

arteam reopened this Oct 12, 2021

arteam mentioned this issue Oct 13, 2021

Return 200 OK response code for a cluster health timeout #78968

Merged

arteam mentioned this issue Oct 18, 2021

Use query param instead of a system property for opting in for new cluster health response code #79351

Merged

arteam mentioned this issue Oct 18, 2021

[7.x] Use query param instead of a system property for opting in for new cluster health response code #79397

Merged

arteam mentioned this issue Oct 19, 2021

[7.x] Use query param instead of a system property for opting in for new cluster health response code (#79397) #79435

Merged

arteam closed this as completed in #78968 Nov 6, 2021

DaveCTurner reopened this Jun 2, 2022

arteam removed their assignment Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HTTP response code for cluster health API #70849

Fix HTTP response code for cluster health API #70849

DaveCTurner commented Mar 25, 2021 •

edited

Loading

elasticmachine commented Mar 25, 2021

Mpdreamz commented Apr 1, 2021

DaveCTurner commented Apr 1, 2021

loretoparisi commented Mar 15, 2022

j-bennet commented Jun 2, 2022 •

edited

Loading

DaveCTurner commented Jun 2, 2022

Fix HTTP response code for cluster health API #70849

Fix HTTP response code for cluster health API #70849

Comments

DaveCTurner commented Mar 25, 2021 • edited Loading

elasticmachine commented Mar 25, 2021

Mpdreamz commented Apr 1, 2021

DaveCTurner commented Apr 1, 2021

loretoparisi commented Mar 15, 2022

j-bennet commented Jun 2, 2022 • edited Loading

DaveCTurner commented Jun 2, 2022

DaveCTurner commented Mar 25, 2021 •

edited

Loading

j-bennet commented Jun 2, 2022 •

edited

Loading