Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing default no_master_block from write to metadata_write #3621

Merged
merged 1 commit into from
Jul 1, 2022

Conversation

gbbafna
Copy link
Collaborator

@gbbafna gbbafna commented Jun 17, 2022

Signed-off-by: Gaurav Bafna [email protected]

Description

When there is no master in the cluster, or a node is partitioned off the cluster all writes see a 5xx, this leads to availability drop espl when there is a master quorum loss(writes do not need a quorum to succeed)

We can switch the default to only fail metadata writes as a part of index creation/dynamic mapping updates but let other writes succeed.

Issues Resolved

#3045

Check List

  • [] New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@gbbafna gbbafna requested review from a team and reta as code owners June 17, 2022 06:51
@gbbafna gbbafna marked this pull request as draft June 17, 2022 06:51
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 7a0c38ece53ed99101223d7db4dbb857cac2ccee
Log 6097

Reports 6097

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 0dd18f7ee5200864aea25dec352efbf6b2c33434
Log 6099

Reports 6099

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 4cde722784ba07b73eadd4cc8460438941cad466
Log 6100

Reports 6100

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure da3a0829b2fedbbb4b6bcd1262262629eebfb0f1
Log 6101

Reports 6101

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 12c268b
Log 6105

Reports 6105

@gbbafna
Copy link
Collaborator Author

gbbafna commented Jun 20, 2022

All the failing tests are passing locally and are unrelated to the changes done .

  1. org.opensearch.cluster.remote.test.RemoteClustersIT.testHAProxyModeConnectionWorks
  2. org.opensearch.index.ShardIndexingPressureIT.testShardIndexingPressureTrackingDuringBulkWrites

can admin please rerun the test ?

@gbbafna gbbafna marked this pull request as ready for review June 20, 2022 11:35
@Bukhtawar
Copy link
Collaborator

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 12c268b
Log 6140

Reports 6140

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gbbafna we might need to modify this integ test to reflect the new behaviour, unsure how this is passing though

try {
clientToClusterManagerlessNode.prepareUpdate("test1", "1")
.setDoc(Requests.INDEX_CONTENT_TYPE, "field", "value2")
.setTimeout(timeout)
.get();
fail("Expected ClusterBlockException");
} catch (ClusterBlockException e) {
assertThat(System.currentTimeMillis() - now, greaterThan(timeout.millis() - 50));
assertThat(e.status(), equalTo(RestStatus.SERVICE_UNAVAILABLE));
} catch (Exception e) {
logger.info("unexpected", e);
throw e;
}
try {
clientToClusterManagerlessNode.prepareIndex("test1")
.setId("1")
.setSource(XContentFactory.jsonBuilder().startObject().endObject())
.setTimeout(timeout)
.get();
fail("Expected ClusterBlockException");
} catch (ClusterBlockException e) {
assertThat(e.status(), equalTo(RestStatus.SERVICE_UNAVAILABLE));
}

@gbbafna
Copy link
Collaborator Author

gbbafna commented Jun 20, 2022

Thanks @gbbafna we might need to modify this integ test to reflect the new behaviour, unsure how this is passing though

try {
clientToClusterManagerlessNode.prepareUpdate("test1", "1")
.setDoc(Requests.INDEX_CONTENT_TYPE, "field", "value2")
.setTimeout(timeout)
.get();
fail("Expected ClusterBlockException");
} catch (ClusterBlockException e) {
assertThat(System.currentTimeMillis() - now, greaterThan(timeout.millis() - 50));
assertThat(e.status(), equalTo(RestStatus.SERVICE_UNAVAILABLE));
} catch (Exception e) {
logger.info("unexpected", e);
throw e;
}
try {
clientToClusterManagerlessNode.prepareIndex("test1")
.setId("1")
.setSource(XContentFactory.jsonBuilder().startObject().endObject())
.setTimeout(timeout)
.get();
fail("Expected ClusterBlockException");
} catch (ClusterBlockException e) {
assertThat(e.status(), equalTo(RestStatus.SERVICE_UNAVAILABLE));
}

This test sets the cluster.no_cluster_manager_block to write , hence is passing . NoClusterManagerNodeIT has tests for all the possible values of cluster.no_cluster_manager_block, so we are good with the coverage as well .

.put(NoMasterBlockService.NO_CLUSTER_MANAGER_BLOCK_SETTING.getKey(), "write")

@tlfeng tlfeng added enhancement Enhancement or improvement to existing feature or request distributed framework labels Jun 21, 2022
@tlfeng
Copy link
Collaborator

tlfeng commented Jun 22, 2022

@Bukhtawar Does it need to be backported to 2.x branch? If so please add backport 2.x label, then feel free to merge.

@Bukhtawar
Copy link
Collaborator

/cc @shwetathareja

@Bukhtawar Bukhtawar added backport 1.x backport 2.x Backport to 2.x branch labels Jun 22, 2022
@Bukhtawar Bukhtawar requested a review from shwetathareja June 22, 2022 09:08
@Bukhtawar Bukhtawar merged commit 07dc19d into opensearch-project:main Jul 1, 2022
@opensearch-trigger-bot
Copy link
Contributor

The backport to 1.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.x 1.x
# Navigate to the new working tree
cd .worktrees/backport-1.x
# Create a new branch
git switch --create backport/backport-3621-to-1.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 07dc19def5e02ee8801e5c0d194ce0a569e5e310
# Push it to GitHub
git push --set-upstream origin backport/backport-3621-to-1.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.x

Then, create a pull request where the base branch is 1.x and the compare/head branch is backport/backport-3621-to-1.x.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 1, 2022
Bukhtawar pushed a commit that referenced this pull request Jul 1, 2022
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.x backport 2.x Backport to 2.x branch distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants