Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update last seen cluster state in commit phase #16215

Merged
merged 2 commits into from
Oct 15, 2024

Conversation

soosinha
Copy link
Member

@soosinha soosinha commented Oct 7, 2024

Description

Coordination metadata contains accepted voting configuration and committed voting configuration.
When the voting configuration changes, the changed accepted voting configuration is sent in publish phase. The follower nodes apply this publish phase cluster state on the lastSeen cluster state in PublicationTransportHandler. In the commit phase, each node set the committed voting configuration using the accepted voting configuration by themselves.
When the next cluster state is published, it does not contain any diff in the voting configuration. This next cluster state update uses lastSeen again to apply the diff due to which the committed voting configuration is still set to an older value.
With local cluster state publication, this issue does not occur as the entire coordination metadata is sent in every diff.
With remote cluster state publication, we can mitigate this in the same way by always sending coordination metadata in the diff. But this has the overhead of downloading the coordination metadata everytime on each data node.
So the alternative approach is to update the lastSeen cluster state in the commite phase

Related Issues

NA

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@soosinha soosinha force-pushed the coord_metadata_mismatch branch from 8798069 to 3a1e37f Compare October 7, 2024 13:43
@soosinha soosinha changed the title Always send coordination metadata in diff Always send coordination metadata in remote state diff Oct 7, 2024
Copy link
Contributor

github-actions bot commented Oct 7, 2024

❌ Gradle check result for 3a1e37f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only concern I see is the additional download overhead(remote store calls) on the follower nodes

@soosinha
Copy link
Member Author

soosinha commented Oct 8, 2024

The only concern I see is the additional download overhead(remote store calls) on the follower nodes

I think the long term fix should be to update the lastSeen cluster state in commit phase with the committed config.

@soosinha soosinha force-pushed the coord_metadata_mismatch branch from 3a1e37f to 3826aa7 Compare October 8, 2024 13:16
@soosinha
Copy link
Member Author

soosinha commented Oct 9, 2024

I am assuming last seen that we are setting is at the end of the publication and no failure can revert this last seen for the given term and version

Yes last seen is being set at the end of publication (last step in the commit phase).

@soosinha soosinha changed the title Always send coordination metadata in remote state diff Update last seen cluster state in commit phase Oct 9, 2024
@soosinha soosinha force-pushed the coord_metadata_mismatch branch from 3826aa7 to 8ffcaed Compare October 9, 2024 15:49
Copy link
Contributor

github-actions bot commented Oct 9, 2024

❕ Gradle check result for 8ffcaed: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.03%. Comparing base (9ddee61) to head (86d2ee2).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main   #16215   +/-   ##
=========================================
  Coverage     72.03%   72.03%           
- Complexity    64782    64818   +36     
=========================================
  Files          5307     5307           
  Lines        302545   302548    +3     
  Branches      43703    43703           
=========================================
+ Hits         217925   217940   +15     
- Misses        66712    66745   +33     
+ Partials      17908    17863   -45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but needs a CHANGELOG entry

@soosinha soosinha force-pushed the coord_metadata_mismatch branch from 8ffcaed to 86d2ee2 Compare October 15, 2024 03:13
@soosinha soosinha added the backport 2.x Backport to 2.x branch label Oct 15, 2024
Copy link
Contributor

✅ Gradle check result for 86d2ee2: SUCCESS

@soosinha
Copy link
Member Author

LGTM but needs a CHANGELOG entry

Added CHANGELOG

@Bukhtawar Bukhtawar merged commit a53e0c6 into opensearch-project:main Oct 15, 2024
41 of 42 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-16215-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a53e0c63aa9c6a85fed30e7bdce8b533aa471060
# Push it to GitHub
git push --set-upstream origin backport/backport-16215-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-16215-to-2.x.

soosinha added a commit to soosinha/OpenSearch that referenced this pull request Oct 15, 2024
)

* Update last seen cluster state on apply commit

Signed-off-by: Sooraj Sinha <[email protected]>
gbbafna pushed a commit that referenced this pull request Oct 15, 2024
dk2k pushed a commit to dk2k/OpenSearch that referenced this pull request Oct 16, 2024
)

* Update last seen cluster state on apply commit

Signed-off-by: Sooraj Sinha <[email protected]>
dk2k pushed a commit to dk2k/OpenSearch that referenced this pull request Oct 17, 2024
)

* Update last seen cluster state on apply commit

Signed-off-by: Sooraj Sinha <[email protected]>
dk2k pushed a commit to dk2k/OpenSearch that referenced this pull request Oct 21, 2024
)

* Update last seen cluster state on apply commit

Signed-off-by: Sooraj Sinha <[email protected]>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Jan 21, 2025
)

* Update last seen cluster state on apply commit

Signed-off-by: Sooraj Sinha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants