Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown replace contracts auto-expand replicas #89527

Closed
henningandersen opened this issue Aug 23, 2022 · 2 comments · Fixed by #96281
Closed

Shutdown replace contracts auto-expand replicas #89527

henningandersen opened this issue Aug 23, 2022 · 2 comments · Fixed by #96281
Labels
>bug :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown Team:Core/Infra Meta label for core/infra team

Comments

@henningandersen
Copy link
Contributor

Elasticsearch Version

7.17.1

Installed Plugins

No response

Java Version

bundled

OS Version

ESS

Problem Description

Shutdown with mode REPLACE will move all shards from the node being replaced (source) to a new node (target node).

However, it also affects auto-expand replicas. Today, it will contract the auto-expand replicas by considering them to not expand to neither the source nor the target node. This is wrong, instead the source/target pair should be considered one node.

The REPLACE indication is expected to be added prior to starting the new target node (but we should be lenient towards it appearing later). This adds a bit of complexity to how to handle auto-expand replicas. I propose to:

  1. disregard a shutdown for REPLACE record when there is no corresponding target node.
  2. not expand to the source node when the target node is present.

We discussed this topic during original implementation here as well. The problem with the current implementation is that we risk reducing the number of replicas to 0 during a shutdown for REPLACE, potentially putting data at risk.

Steps to Reproduce

N/A

Logs (if relevant)

...  updating number_of_replicas to [0] for indices [ ...
@henningandersen henningandersen added >bug needs:triage Requires assignment of a team area label :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown and removed needs:triage Requires assignment of a team area label labels Aug 23, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@kingherc
Copy link
Contributor

This is still relevant for 7.17.x so re-opening this with the main aim to backport the PR to 7.17.x.

@kingherc kingherc reopened this Aug 25, 2023
kingherc pushed a commit to kingherc/elasticsearch that referenced this issue Aug 25, 2023
Prior to this change NodeReplacementAllocationDecider was unconditionally
skipping both replacement source and target nodes when calculation auto-expand
replicas. This is fixed by autoexpanding to the replacement node if source node
already had shards of the index

Backport of PR elastic#96281 amended for 7.17.x

Closes elastic#89527
kingherc added a commit that referenced this issue Aug 28, 2023
Prior to this change NodeReplacementAllocationDecider was unconditionally skipping both replacement source and target nodes when calculation auto-expand replicas. This is fixed by autoexpanding to the replacement node if source node already had shards of the index

Backport of PR #96281 amended for 7.17.x

Closes #89527

Co-authored-by: Ievgen Degtiarenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants