-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node shutdown API for shutting down nodes cleanly #70338
Labels
:Core/Infra/Node Lifecycle
Node startup, bootstrapping, and shutdown
>feature
Meta
Team:Core/Infra
Meta label for core/infra team
Comments
Pinging @elastic/es-core-infra (Team:Core/Infra) |
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Mar 22, 2021
This commit adds the rest endpoints for the node shutdown API. These APIs are behind the `es.shutdown_feature_flag_enabled` feature flag for now, as development is ongoing. Currently these APIs do not do anything, returning immediately. We plan to implement them for real in subsequent work. Relates to elastic#70338
dakrone
added a commit
that referenced
this issue
Mar 23, 2021
This commit adds the rest endpoints for the node shutdown API. These APIs are behind the `es.shutdown_feature_flag_enabled` feature flag for now, as development is ongoing. Currently these APIs do not do anything, returning immediately. We plan to implement them for real in subsequent work. Relates to #70338
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Mar 23, 2021
This commit adds the rest endpoints for the node shutdown API. These APIs are behind the `es.shutdown_feature_flag_enabled` feature flag for now, as development is ongoing. Currently these APIs do not do anything, returning immediately. We plan to implement them for real in subsequent work. Relates to elastic#70338
This was referenced Mar 31, 2021
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Apr 26, 2021
This commit changes the `PersistentTasksClusterService` to limit nodes for a task to a subset of nodes (candidates) that are not currently shutting down. It does not yet cancel tasks that may already be running on the nodes that are shut down, that will be added in a subsequent request. Relates to elastic#70338
dakrone
added a commit
that referenced
this issue
Apr 28, 2021
This commit changes the `PersistentTasksClusterService` to limit nodes for a task to a subset of nodes (candidates) that are not currently shutting down. It does not yet cancel tasks that may already be running on the nodes that are shut down, that will be added in a subsequent request. Relates to #70338
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Apr 28, 2021
This commit changes the `PersistentTasksClusterService` to limit nodes for a task to a subset of nodes (candidates) that are not currently shutting down. It does not yet cancel tasks that may already be running on the nodes that are shut down, that will be added in a subsequent request. Relates to elastic#70338
dakrone
added a commit
that referenced
this issue
Apr 28, 2021
…72426) * Don't assign persistent tasks to nodes shutting down (#72260) This commit changes the `PersistentTasksClusterService` to limit nodes for a task to a subset of nodes (candidates) that are not currently shutting down. It does not yet cancel tasks that may already be running on the nodes that are shut down, that will be added in a subsequent request. Relates to #70338 * Fix transport client usage in test
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
May 3, 2021
Originally these were stored in the cluster state using a single class, however, they will need to be different objects without common parts, and they will be calculated on the fly rather than persisted into cluster state. This removes the `NodeShutdownComponentStatus` class, as its no longer needed. Relates to elastic#70338
dakrone
added a commit
that referenced
this issue
May 4, 2021
Originally these were stored in the cluster state using a single class, however, they will need to be different objects without common parts, and they will be calculated on the fly rather than persisted into cluster state. This removes the NodeShutdownComponentStatus class, as its no longer needed. Relates to #70338
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
May 4, 2021
Originally these were stored in the cluster state using a single class, however, they will need to be different objects without common parts, and they will be calculated on the fly rather than persisted into cluster state. This removes the NodeShutdownComponentStatus class, as its no longer needed. Relates to elastic#70338
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Jun 22, 2021
…c#74267) This converts the system property feature flag 'es.shutdown_feature_flag_enabled' to a regular non-dynamic node setting. This setting can only be set to 'true' on a snapshot build of Elasticsearch (not a release build). Relates to elastic#70338
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Aug 2, 2021
It previously defaulted to false. The setting can still only be set to 'true' on a non-release (snapshot) build of Elasticsearch. Relates to elastic#70338
elasticsearchmachine
pushed a commit
to elasticsearchmachine/elasticsearch
that referenced
this issue
Aug 2, 2021
…elastic#75962) * Flip node shutdown feature flag to default to true on snapshot builds It previously defaulted to false. The setting can still only be set to 'true' on a non-release (snapshot) build of Elasticsearch. Relates to elastic#70338 * Handle case where operator privileges are enabled
lockewritesdocs
pushed a commit
to lockewritesdocs/elasticsearch
that referenced
this issue
Aug 3, 2021
…elastic#75962) * Flip node shutdown feature flag to default to true on snapshot builds It previously defaulted to false. The setting can still only be set to 'true' on a non-release (snapshot) build of Elasticsearch. Relates to elastic#70338 * Handle case where operator privileges are enabled
This was referenced Sep 1, 2021
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 11, 2021
This commit enhances `DiskThresholdMonitor` so that indices that have a flood-stage block will not have the block removed while they reside on a node that is part of a "REPLACE"-type node shutdown. This prevents a situation where a node is blocked due to disk usage, then during the replacement the block is removed while shards are relocating to the target node, indexing occurs, and then the target runs out of space due to the additional documents. Relates to elastic#70338 and elastic#76247
dakrone
added a commit
that referenced
this issue
Oct 12, 2021
#78942) This commit enhances `DiskThresholdMonitor` so that indices that have a flood-stage block will not have the block removed while they reside on a node that is part of a "REPLACE"-type node shutdown. This prevents a situation where a node is blocked due to disk usage, then during the replacement the block is removed while shards are relocating to the target node, indexing occurs, and then the target runs out of space due to the additional documents. Relates to #70338 and #76247
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 12, 2021
elastic#78942) This commit enhances `DiskThresholdMonitor` so that indices that have a flood-stage block will not have the block removed while they reside on a node that is part of a "REPLACE"-type node shutdown. This prevents a situation where a node is blocked due to disk usage, then during the replacement the block is removed while shards are relocating to the target node, indexing occurs, and then the target runs out of space due to the additional documents. Relates to elastic#70338 and elastic#76247 # Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdMonitorTests.java
elasticsearchmachine
pushed a commit
that referenced
this issue
Oct 12, 2021
#78942) (#79008) This commit enhances `DiskThresholdMonitor` so that indices that have a flood-stage block will not have the block removed while they reside on a node that is part of a "REPLACE"-type node shutdown. This prevents a situation where a node is blocked due to disk usage, then during the replacement the block is removed while shards are relocating to the target node, indexing occurs, and then the target runs out of space due to the additional documents. Relates to #70338 and #76247 # Conflicts: # server/src/test/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdMonitorTests.java
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 14, 2021
This commit allows replica shards that have existing data on disk to be re-allocated to the target of a "REPLACE" type node shutdown. Prior to this if the target node of a shutdown were to restart, the replicas would not be allowed to be allocated even if their data existed on disk. Relates to elastic#70338 as a follow-up to elastic#76247
dakrone
added a commit
that referenced
this issue
Oct 15, 2021
…ement (#79171) This commit allows replica shards that have existing data on disk to be re-allocated to the target of a "REPLACE" type node shutdown. Prior to this if the target node of a shutdown were to restart, the replicas would not be allowed to be allocated even if their data existed on disk. Relates to #70338 as a follow-up to #76247
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 15, 2021
…ement (elastic#79171) This commit allows replica shards that have existing data on disk to be re-allocated to the target of a "REPLACE" type node shutdown. Prior to this if the target node of a shutdown were to restart, the replicas would not be allowed to be allocated even if their data existed on disk. Relates to elastic#70338 as a follow-up to elastic#76247
elasticsearchmachine
pushed a commit
that referenced
this issue
Oct 15, 2021
…ement (#79171) (#79266) This commit allows replica shards that have existing data on disk to be re-allocated to the target of a "REPLACE" type node shutdown. Prior to this if the target node of a shutdown were to restart, the replicas would not be allowed to be allocated even if their data existed on disk. Relates to #70338 as a follow-up to #76247
pgomulka
added a commit
that referenced
this issue
Dec 16, 2021
This PR adds full cluster restart and rolling upgrade tests, to ensure that Node Shutdown handles BWC correctly. Relates #70338
pgomulka
added a commit
to pgomulka/elasticsearch
that referenced
this issue
Dec 16, 2021
This PR adds full cluster restart and rolling upgrade tests, to ensure that Node Shutdown handles BWC correctly. Relates elastic#70338
elasticsearchmachine
pushed a commit
that referenced
this issue
Dec 16, 2021
I believe since this API has been released, we can close this issue. Any further work can go into dedicated issues. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Core/Infra/Node Lifecycle
Node startup, bootstrapping, and shutdown
>feature
Meta
Team:Core/Infra
Meta label for core/infra team
This issue supersedes #49064, which will be closed.
The node shutdown API should provide a safe way for operators to shutdown a node ensuring all relevant orchestration steps are taken to prevent cluster instability and data loss. The feature can be used to decommission, power cycle or upgrade nodes.
An example of marking a node as part of the shutdown:
And retrieving the shutdown status:
Here are some high-level tasks that need to be completed for this:
ShutdownAwarePlugin
and stop its work while shutting downPhase 2:
The text was updated successfully, but these errors were encountered: