Add node shutdown API for shutting down nodes cleanly #70338

dakrone · 2021-03-11T21:26:34Z

This issue supersedes #49064, which will be closed.

The node shutdown API should provide a safe way for operators to shutdown a node ensuring all relevant orchestration steps are taken to prevent cluster instability and data loss. The feature can be used to decommission, power cycle or upgrade nodes.

An example of marking a node as part of the shutdown:

PUT /_nodes/<node_id>/shutdown
{
  "type": "remove",¹
  "reason": "shutdown of node so we can remove it from the cluster"²
}
¹ The type of decommission, in this case either a "remove" (the node is never coming back) or a "restart"
² A user-enterable free text block description of the reason why the node is being shut down

And retrieving the shutdown status:

GET /_nodes/<node_id>/shutdown

{
  "node": "data-node-1",
  "node_id": "node-id-1",
  "type": "remove",
  "reason": "shutdown of node so we can remove it from the cluster"
  "status": {¹
     "shutdown_status": "IN_PROGRESS",²
     "shard_migration": {
       "status": "IN_PROGRESS",
       "shard_migrations_remaining": 7,³
       "time_started": "<user readable date>",
       "time_started_millis": 234091892
     },
     "persistent_tasks": {⁴
       "status": "IN_PROGRESS",
       "tasks_remaining": 2,⁵
       "error": "ICouldntStopTheTasksException[i can't do that dave]...etc stacktrack etc...",
       "time_started": "<user readable date>",
       "time_started_millis": 128391987
     },
     "plugins": {⁶
       "status": "NOT_STARTED",
     },
     "data_loss_on_removal": false⁷
  },
  "time_since_shutdown": "1.2h",⁸
  "time_since_shutdown_millis": 4320000,
  "shutdown_started": "<user readable date>",9
  "shutdown_started_millis": 128391987
}
1. Shows the current state of the shutdown for this node. This can be used by operators to track progress
2. Overall shutdown status. Possible values are: "IN_PROGRESS", "COMPLETE", "STALLED". IF the shutdown is STALLED a error field will also be returned containing the reason the shutdown is stalled (e.g. no nodes can take remaining shards)
3. How many shards remain to be migrated off of this node
4. Whether in progress persistent tasks have been halt and new tasks have been blocked
5. The number of tasks that need to be completed before shutdown
6. Whether plugins have indicated that they are ready for shutdown
7. Whether data loss could occur if the node was terminated now
8. How long the shutdown has been ongoing.
9. When the shutdown was initiated.

Here are some high-level tasks that need to be completed for this:

Phase 2:

Add "REPLACE" shutdown type
- Add REST and cluster state support for the "REPLACE" shutdown type (@gwbrown)
- Add allocation decider and change existing deciders to handle node replacements (@dakrone)
Upgrades to persistent task handling
- Cancel pre-existing tasks running on a node that is marked as shutting down (@dakrone)
- Hook persistent task state into shutdown status API (@dakrone)
Enhance data tier allocation decider to allow migrating to a different tier if all nodes in a certain tier are shutdown (possibly?)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-03-11T21:26:37Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

This commit adds the rest endpoints for the node shutdown API. These APIs are behind the `es.shutdown_feature_flag_enabled` feature flag for now, as development is ongoing. Currently these APIs do not do anything, returning immediately. We plan to implement them for real in subsequent work. Relates to elastic#70338

This commit adds the rest endpoints for the node shutdown API. These APIs are behind the `es.shutdown_feature_flag_enabled` feature flag for now, as development is ongoing. Currently these APIs do not do anything, returning immediately. We plan to implement them for real in subsequent work. Relates to #70338

This commit adds the rest endpoints for the node shutdown API. These APIs are behind the `es.shutdown_feature_flag_enabled` feature flag for now, as development is ongoing. Currently these APIs do not do anything, returning immediately. We plan to implement them for real in subsequent work. Relates to elastic#70338

This commit changes the `PersistentTasksClusterService` to limit nodes for a task to a subset of nodes (candidates) that are not currently shutting down. It does not yet cancel tasks that may already be running on the nodes that are shut down, that will be added in a subsequent request. Relates to elastic#70338

This commit changes the `PersistentTasksClusterService` to limit nodes for a task to a subset of nodes (candidates) that are not currently shutting down. It does not yet cancel tasks that may already be running on the nodes that are shut down, that will be added in a subsequent request. Relates to #70338

This commit changes the `PersistentTasksClusterService` to limit nodes for a task to a subset of nodes (candidates) that are not currently shutting down. It does not yet cancel tasks that may already be running on the nodes that are shut down, that will be added in a subsequent request. Relates to elastic#70338

…72426) * Don't assign persistent tasks to nodes shutting down (#72260) This commit changes the `PersistentTasksClusterService` to limit nodes for a task to a subset of nodes (candidates) that are not currently shutting down. It does not yet cancel tasks that may already be running on the nodes that are shut down, that will be added in a subsequent request. Relates to #70338 * Fix transport client usage in test

Originally these were stored in the cluster state using a single class, however, they will need to be different objects without common parts, and they will be calculated on the fly rather than persisted into cluster state. This removes the `NodeShutdownComponentStatus` class, as its no longer needed. Relates to elastic#70338

Originally these were stored in the cluster state using a single class, however, they will need to be different objects without common parts, and they will be calculated on the fly rather than persisted into cluster state. This removes the NodeShutdownComponentStatus class, as its no longer needed. Relates to #70338

Originally these were stored in the cluster state using a single class, however, they will need to be different objects without common parts, and they will be calculated on the fly rather than persisted into cluster state. This removes the NodeShutdownComponentStatus class, as its no longer needed. Relates to elastic#70338

…c#74267) This converts the system property feature flag 'es.shutdown_feature_flag_enabled' to a regular non-dynamic node setting. This setting can only be set to 'true' on a snapshot build of Elasticsearch (not a release build). Relates to elastic#70338

…74267) (#74446) This converts the system property feature flag 'es.shutdown_feature_flag_enabled' to a regular non-dynamic node setting. This setting can only be set to 'true' on a snapshot build of Elasticsearch (not a release build). Relates to #70338

It previously defaulted to false. The setting can still only be set to 'true' on a non-release (snapshot) build of Elasticsearch. Relates to elastic#70338

…#75962) * Flip node shutdown feature flag to default to true on snapshot builds It previously defaulted to false. The setting can still only be set to 'true' on a non-release (snapshot) build of Elasticsearch. Relates to #70338 * Handle case where operator privileges are enabled

…elastic#75962) * Flip node shutdown feature flag to default to true on snapshot builds It previously defaulted to false. The setting can still only be set to 'true' on a non-release (snapshot) build of Elasticsearch. Relates to elastic#70338 * Handle case where operator privileges are enabled

This commit enhances `DiskThresholdMonitor` so that indices that have a flood-stage block will not have the block removed while they reside on a node that is part of a "REPLACE"-type node shutdown. This prevents a situation where a node is blocked due to disk usage, then during the replacement the block is removed while shards are relocating to the target node, indexing occurs, and then the target runs out of space due to the additional documents. Relates to elastic#70338 and elastic#76247

#78942) This commit enhances `DiskThresholdMonitor` so that indices that have a flood-stage block will not have the block removed while they reside on a node that is part of a "REPLACE"-type node shutdown. This prevents a situation where a node is blocked due to disk usage, then during the replacement the block is removed while shards are relocating to the target node, indexing occurs, and then the target runs out of space due to the additional documents. Relates to #70338 and #76247

elastic#78942) This commit enhances `DiskThresholdMonitor` so that indices that have a flood-stage block will not have the block removed while they reside on a node that is part of a "REPLACE"-type node shutdown. This prevents a situation where a node is blocked due to disk usage, then during the replacement the block is removed while shards are relocating to the target node, indexing occurs, and then the target runs out of space due to the additional documents. Relates to elastic#70338 and elastic#76247 # Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdMonitorTests.java

#78942) (#79008) This commit enhances `DiskThresholdMonitor` so that indices that have a flood-stage block will not have the block removed while they reside on a node that is part of a "REPLACE"-type node shutdown. This prevents a situation where a node is blocked due to disk usage, then during the replacement the block is removed while shards are relocating to the target node, indexing occurs, and then the target runs out of space due to the additional documents. Relates to #70338 and #76247 # Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdMonitorTests.java

This commit allows replica shards that have existing data on disk to be re-allocated to the target of a "REPLACE" type node shutdown. Prior to this if the target node of a shutdown were to restart, the replicas would not be allowed to be allocated even if their data existed on disk. Relates to elastic#70338 as a follow-up to elastic#76247

…ement (#79171) This commit allows replica shards that have existing data on disk to be re-allocated to the target of a "REPLACE" type node shutdown. Prior to this if the target node of a shutdown were to restart, the replicas would not be allowed to be allocated even if their data existed on disk. Relates to #70338 as a follow-up to #76247

…ement (elastic#79171) This commit allows replica shards that have existing data on disk to be re-allocated to the target of a "REPLACE" type node shutdown. Prior to this if the target node of a shutdown were to restart, the replicas would not be allowed to be allocated even if their data existed on disk. Relates to elastic#70338 as a follow-up to elastic#76247

…ement (#79171) (#79266) This commit allows replica shards that have existing data on disk to be re-allocated to the target of a "REPLACE" type node shutdown. Prior to this if the target node of a shutdown were to restart, the replicas would not be allowed to be allocated even if their data existed on disk. Relates to #70338 as a follow-up to #76247

This PR adds full cluster restart and rolling upgrade tests, to ensure that Node Shutdown handles BWC correctly. Relates #70338

This PR adds full cluster restart and rolling upgrade tests, to ensure that Node Shutdown handles BWC correctly. Relates elastic#70338

This PR adds full cluster restart and rolling upgrade tests, to ensure that Node Shutdown handles BWC correctly. Relates #70338

dakrone · 2022-05-03T19:47:39Z

I believe since this API has been released, we can close this issue. Any further work can go into dedicated issues.

dakrone added >feature :Core/Infra/Core Core issues without another label Meta labels Mar 11, 2021

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Mar 11, 2021

dakrone mentioned this issue Mar 11, 2021

An API for decommissioning a node or nodes #49064

Closed

dakrone assigned dakrone and gwbrown Mar 11, 2021

dakrone mentioned this issue Mar 22, 2021

Add REST scaffolding for node shutdown API #70697

Merged

gwbrown mentioned this issue Mar 22, 2021

Add custom metadata to track node shutdowns #70044

Merged

gwbrown added :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown and removed :Core/Infra/Core Core issues without another label labels Mar 24, 2021

This was referenced Mar 31, 2021

Integrate Node Shutdown API with cluster metadata #71162

Merged

Allow node shutdowns to change types from "restart" to "remove" #71305

Open

DaveCTurner mentioned this issue Apr 6, 2021

Make snapshots more resilient to node departure #71333

Open

gwbrown mentioned this issue Apr 22, 2021

Add an allocation decider to prevent allocating shards to nodes which are preparing for shutdown #71658

Merged

dakrone mentioned this issue Apr 26, 2021

Don't assign persistent tasks to nodes shutting down #72260

Merged

dakrone mentioned this issue May 3, 2021

Move shutdown component status out into separate classes #72653

Merged

gwbrown mentioned this issue Jul 22, 2021

Delay shard reassignment from nodes which are known to be restarting #75606

Merged

dakrone mentioned this issue Aug 2, 2021

Flip node shutdown feature flag to default to true on snapshot builds #75962

Merged

gwbrown mentioned this issue Aug 16, 2021

Remove Node Shutdown API feature flag #76588

Merged

This was referenced Sep 1, 2021

Add target_node_name for REPLACE shutdown type #77151

Merged

Add Node Shutdown upgrade tests #77157

Closed

dakrone mentioned this issue Oct 4, 2021

Add node REPLACE shutdown implementation #76247

Merged

dakrone mentioned this issue Oct 11, 2021

Do not remove flood block from indices on nodes undergoing replacement #78942

Merged

dakrone mentioned this issue Oct 14, 2021

Allow re-allocation of replica shards on nodes during shutdown replacement #79171

Merged

pgomulka mentioned this issue Dec 8, 2021

Add Node Shutdown upgrade tests #81506

Merged

pgomulka added a commit that referenced this issue Dec 16, 2021

Add Node Shutdown upgrade tests (#81506)

25379e2

This PR adds full cluster restart and rolling upgrade tests, to ensure that Node Shutdown handles BWC correctly. Relates #70338

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Dec 16, 2021

Add Node Shutdown upgrade tests (elastic#81506)

fe3876e

This PR adds full cluster restart and rolling upgrade tests, to ensure that Node Shutdown handles BWC correctly. Relates elastic#70338

elasticsearchmachine pushed a commit that referenced this issue Dec 16, 2021

Add Node Shutdown upgrade tests (#81506) (#81829)

fe60c33

This PR adds full cluster restart and rolling upgrade tests, to ensure that Node Shutdown handles BWC correctly. Relates #70338

colings86 removed their assignment Dec 20, 2021

dakrone closed this as completed May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add node shutdown API for shutting down nodes cleanly #70338

Add node shutdown API for shutting down nodes cleanly #70338

dakrone commented Mar 11, 2021 •

edited

Loading

elasticmachine commented Mar 11, 2021

dakrone commented May 3, 2022

Add node shutdown API for shutting down nodes cleanly #70338

Add node shutdown API for shutting down nodes cleanly #70338

Comments

dakrone commented Mar 11, 2021 • edited Loading

elasticmachine commented Mar 11, 2021

dakrone commented May 3, 2022

dakrone commented Mar 11, 2021 •

edited

Loading