Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HTTP response code for cluster health API #70849

Open
DaveCTurner opened this issue Mar 25, 2021 · 6 comments · Fixed by #78180 or #78968
Open

Fix HTTP response code for cluster health API #70849

DaveCTurner opened this issue Mar 25, 2021 · 6 comments · Fixed by #78180 or #78968
Labels
>bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Mar 25, 2021

Today GET _cluster/health returns 408 Request timeout if it times out before the desired condition is reached. This response code is to indicate that the server timed out waiting for a request from the client, so it isn't appropriate here at all. There's no great fit for a server-side timeout response code, I suggest 503.

This API already returns 503 Service unavailable if there is no master, after waiting a while, but immediately returns 200 OK if there is a master but STATE_NOT_RECOVERED_BLOCK is in place. I think we should wait and return 503 in the presence of the STATE_NOT_RECOVERED_BLOCK too.

@DaveCTurner DaveCTurner added >bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. team-discuss labels Mar 25, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 25, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@Mpdreamz
Copy link
Member

Mpdreamz commented Apr 1, 2021

cc @elastic/clients-team 👋

@DaveCTurner
Copy link
Contributor Author

We (@elastic/es-distributed) discussed this in our team sync yesterday. HTTP lacks a usefully descriptive response code for this case (see also #35582). 408 is definitely wrong, but 503 is also inappropriate since the service was available and able to handle the request. We decided to move to 200 since the request was handled as desired, and the response body contains valid information even if the requested state was not reached within the time limit. This change will bring this endpoint in line with bulk indexing and other stats and monitoring endpoints that return 200 if the coordination of the request was successful even if it there were some subsidiary failures during processing.

Future clients will check the timed_out field of the response to determine whether to retry or not. Recognising that today's clients may be relying on the HTTP response code we'll treat this bugfix as a breaking change with the usual deprecation warnings etc. We will also introduce a system property that lets users opt into the future behaviour before it becomes mandatory.

@arteam arteam self-assigned this Sep 22, 2021
arteam added a commit to arteam/elasticsearch that referenced this issue Sep 23, 2021
arteam added a commit that referenced this issue Oct 11, 2021
…8180)

Add a deprecation warning and a system property es.cluster_health.request_timeout_200 to opt in for returning 200 which will be the default in 8.0.0

Fixes #70849
arteam added a commit to arteam/elasticsearch that referenced this issue Oct 11, 2021
…h` (elastic#78180)

Backports elastic#78180 to 7.x.

Add a deprecation warning and a system property es.cluster_health.request_timeout_200 to opt in for returning 200 which will be the default in 8.0.0

Fixes elastic#70849
arteam added a commit that referenced this issue Oct 11, 2021
…h` (#78180) (#78940)

Backports #78180 to 7.x.

Add a deprecation warning and a system property es.cluster_health.request_timeout_200 to opt in for returning 200 which will be the default in 8.0.0

Fixes #70849
@arteam arteam reopened this Oct 12, 2021
arteam added a commit that referenced this issue Oct 18, 2021
…uster health response code (#79351)

The original change was implemented in #78940, bu we have decided to move from a system property to an a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code.

Relates #70849
arteam added a commit to arteam/elasticsearch that referenced this issue Oct 18, 2021
…new cluster health response code

Backport elastic#79351 to 7.x:

The original change was implemented in elastic#78940, but we have decided to move from a system property
to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code.

Relates elastic#70849
arteam added a commit that referenced this issue Oct 18, 2021
…new cluster health response code (#79397)

Backport #79351 to 7.x:

The original change was implemented in #78940, but we have decided to move from a system property
to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code.

Relates #70849
arteam added a commit to arteam/elasticsearch that referenced this issue Oct 19, 2021
…new cluster health response code (elastic#79397)

Backport elastic#79351 to 7.x:

The original change was implemented in elastic#78940, but we have decided to move from a system property
to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code.

Relates elastic#70849
DaveCTurner pushed a commit that referenced this issue Oct 19, 2021
…new cluster health response code (#79397) (#79435)

Backport #79351 to 7.x:

The original change was implemented in #78940, but we have decided to move from a system property
to a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code.

Relates #70849
arteam added a commit that referenced this issue Nov 6, 2021
Returning 408 for a cluster health timeout was deprecated in #78180 and backported to 7.x in #78940

Now we can do a breaking change in 8.0 respecting the user choice to run ES in 7.x compatible mode via the REST Compatibility layer.

Fixes #70849
arteam added a commit to arteam/elasticsearch that referenced this issue Nov 6, 2021
Returning 408 for a cluster health timeout was deprecated in elastic#78180 and backported to 7.x in elastic#78940

Now we can do a breaking change in 8.0 respecting the user choice to run ES in 7.x compatible mode via the REST Compatibility layer.

Fixes elastic#70849
arteam added a commit that referenced this issue Nov 8, 2021
…0464)

Returning 408 for a cluster health timeout was deprecated in #78180 and backported to 7.x in #78940

Now we can do a breaking change in 8.0 respecting the user choice to run ES in 7.x compatible mode via the REST Compatibility layer.

Fixes #70849
@loretoparisi
Copy link

I'm recently facing this error again

error {
  "status": 408,
  "displayName": "RequestTimeout",
  "message": "Request Timeout after 30000ms"
}

@j-bennet
Copy link

j-bennet commented Jun 2, 2022

Looks like the fix was reverted, and as of 8.2.2, 408 is still returned:

> curl -D - -k -u elastic:changeme "https://localhost:9200/_cluster/health?wait_for_nodes=2&timeout=1ms&pretty"
HTTP/1.1 408 Request Timeout
X-elastic-product: Elasticsearch
content-type: application/json
content-length: 465

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : true,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 2,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Elasticsearch version:

> curl -k -u elastic:changeme "https://localhost:9200"
{
  "name" : "eli.local",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "iYa9R7SbRmqbW1bwWGs1aA",
  "version" : {
    "number" : "8.2.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "9876968ef3c745186b94fdabd4483e01499224ef",
    "build_date" : "2022-05-25T15:47:06.259735307Z",
    "build_snapshot" : false,
    "lucene_version" : "9.1.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

What happened?

@DaveCTurner
Copy link
Contributor Author

Ah we forgot to reopen this, sorry.

@DaveCTurner DaveCTurner reopened this Jun 2, 2022
@arteam arteam removed their assignment Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
6 participants