-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster health call to throw decommissioned exception for local decommissioned node #6008
Cluster health call to throw decommissioned exception for local decommissioned node #6008
Conversation
…missioned nodes Signed-off-by: Rishab Nahata <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishab Nahata <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishab Nahata <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishab Nahata <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6008 +/- ##
============================================
+ Coverage 70.73% 70.83% +0.09%
- Complexity 58738 58775 +37
============================================
Files 4771 4771
Lines 280820 280840 +20
Branches 40568 40572 +4
============================================
+ Hits 198645 198920 +275
+ Misses 65865 65619 -246
+ Partials 16310 16301 -9
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Rishab Nahata <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishab Nahata <[email protected]>
if (out.getVersion().onOrAfter(Version.CURRENT)) { | ||
out.writeBoolean(ensureLocalNodeCommissioned); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build fails in mixed cluster test if version check is PUT for 2.6 and not current. Please suggest correct way of doing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will put 2.6 and then backport
Gradle Check (Jenkins) Run Completed with:
|
@@ -111,6 +111,10 @@ | |||
"awareness_attribute":{ | |||
"type":"string", | |||
"description":"The awareness attribute for which the health is required" | |||
}, | |||
"ensure_local_node_commissioned":{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering we could use ensure_node_commissioned
and only support with _local
to start with. Later we could extend this to any node_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could only be two kinds of transport request. One which retrieves information from local cluster state of the node or another which gets it from leader's state. There's no mechanism which says run this transport request on a specific node id. Hence, I feel this would ALWAYS run with local param only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating it to ensure_node_commissioned
@@ -134,7 +140,11 @@ protected void clusterManagerOperation( | |||
final ClusterState unusedState, | |||
final ActionListener<ClusterHealthResponse> listener | |||
) { | |||
|
|||
if (request.ensureLocalNodeCommissioned() | |||
&& discovery instanceof Coordinator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should discovery instanceof Coordinator
be an assertion instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts wouldn't run on prod. And only coordinator has this node's commission status info. If a developer uses a different Discovery mechanism it might break this. Hence putting this check directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly so tests should fail for a developer, in prod this is expected to be Coordinator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be fair to assume that Coordinator
can only be the discovery
for all use cases? Can there a plugin which writes there own Discovery model? I see something like this implemented in Gateway service. Here, they assume discovery might not be instance of Coordinator. But I get your point, for this change we can also add asserts and put in if as well to be a little cautious
if (discovery instanceof Coordinator) {
recoveryRunnable = () -> clusterService.submitStateUpdateTask("local-gateway-elected-state", new RecoverStateUpdateTask());
} else {
final Gateway gateway = new Gateway(settings, clusterService, listGatewayMetaState);
recoveryRunnable = () -> gateway.performStateRecovery(new GatewayRecoveryListener());
}
@Override | ||
public RestStatus status() { | ||
return RestStatus.UNPROCESSABLE_ENTITY; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
424 HTTP Error code seems more appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated
Can you also verify what happens on a new node w/o cluster state that join as decommissioned vs existing node w/ cluster state getting decommissioned |
Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
CHANGELOG.md
Outdated
@@ -48,6 +48,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), | |||
- Changed http code on create index API with bad input raising NotXContentException from 500 to 400 ([#4773](https://github.com/opensearch-project/OpenSearch/pull/4773)) | |||
- Change http code for DecommissioningFailedException from 500 to 400 ([#5283](https://github.com/opensearch-project/OpenSearch/pull/5283)) | |||
- Require MediaType in Strings.toString API ([#6009](https://github.com/opensearch-project/OpenSearch/pull/6009)) | |||
- Cluster health call to throw decommissioned exception for local decommissioned node([#6008](https://github.com/opensearch-project/OpenSearch/pull/6008)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this under section unreleased 2.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
Both fails with same expected error with the new param |
server/src/main/java/org/opensearch/action/admin/cluster/health/ClusterHealthRequest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Rishab Nahata <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
…missioned node (opensearch-project#6008) * Cluster health call to throw decommissioned exception for local decommissioned nodes Signed-off-by: Rishab Nahata <[email protected]>
…missioned node (#6008) (#6059) * [Backport 2.x]Cluster health call to throw decommissioned exception for local decommissioned nodes Signed-off-by: Rishab Nahata <[email protected]>
Description
This PR adds a param to cluster health local call to check if a node is decommissioned or not before retrieving its health from a local cluster state.
Example Request/Response -
Issues Resolved
#4528
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.