Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always create empty translog on replica for remote store enabled index #10012

Merged
merged 5 commits into from
Sep 15, 2023

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented Sep 13, 2023

Description

  • In the restore from snapshot flow, primary shards are recovered from snapshot data and replica shards are recovered using peer-recovery flow.
  • The changes in this PR address following issue which is applicable only to remote store enabled indices.

Issue

  • If we execute following steps in order:
    • Create index
    • Ingest docs
    • Take a snapshot
    • Delete index
    • Created index with the same name
  • The new index gets created with different translog UUID. The data in snapshot refers to older translog UUID.
  • After recovering primary from snapshot, remote segment store is not updated with new data until the value of primaryMode is true.
  • Before primaryMode changed to true, replica recovery is triggered. In peer recovery, we download data from remote segment store.
  • As snapshot restored data is yet to be uploaded to remote segment store, replica fetches data that corresponds to the newly created index. So, replica has segments and translog corresponding to new index
  • After the files are downloaded to replica, peer recovery checks the diff between primary and replica in terms of segment files and copies the diff. As primary has segments data from snapshot, it copies the diff to replica. At this point, replica has segments from snapshot and translog from new index.
  • This fails when we open the engine as translog UUID referred by segments and actual translog files is different.

Solution

  • To avoid the above issue, we create empty translog on replica whenever we download data from remote segment store. We use the same translog UUID that is referred by the downloaded segments.
  • It is safe to create empty translog for replica of remote store enabled index as we use segment replication and translog is just used for durability purpose.
  • Replica only needs to access translog at the time of failover which is already integrated with remote translog.

Related Issues

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 13, 2023

Compatibility status:

Checks if related components are compatible with change a33af59

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Merging #10012 (a33af59) into main (921cd0c) will increase coverage by 0.67%.
Report is 1 commits behind head on main.
The diff coverage is 78.57%.

@@             Coverage Diff              @@
##               main   #10012      +/-   ##
============================================
+ Coverage     70.42%   71.09%   +0.67%     
- Complexity    57451    58094     +643     
============================================
  Files          4824     4824              
  Lines        273918   273927       +9     
  Branches      39918    39920       +2     
============================================
+ Hits         192894   194761    +1867     
+ Misses        64725    62855    -1870     
- Partials      16299    16311      +12     
Files Changed Coverage Δ
.../org/opensearch/index/translog/TranslogHeader.java 81.48% <ø> (ø)
...in/java/org/opensearch/index/shard/IndexShard.java 69.40% <78.57%> (+0.35%) ⬆️

... and 585 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@sachinpkale sachinpkale force-pushed the corrupted-translog-fix branch from 72ab6ed to 6b0ab08 Compare September 14, 2023 03:22
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteBlobWithRetries
      1 org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew

Sachin Kale added 2 commits September 14, 2023 10:48
Signed-off-by: Sachin Kale <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Sachin Kale <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all}

@gbbafna gbbafna merged commit 4a4a8fa into opensearch-project:main Sep 15, 2023
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Sep 15, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 15, 2023
#10012)

Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 4a4a8fa)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@sachinpkale sachinpkale added the backport 2.10 Backport to 2.10 branch label Sep 16, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 16, 2023
#10012)

Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 4a4a8fa)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna pushed a commit that referenced this pull request Sep 16, 2023
#10012) (#10074)

(cherry picked from commit 4a4a8fa)

Signed-off-by: Sachin Kale <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sachinpkale added a commit to sachinpkale/OpenSearch that referenced this pull request Sep 16, 2023
sachinpkale pushed a commit that referenced this pull request Sep 18, 2023
#10012) (#10085)

(cherry picked from commit 4a4a8fa)

Signed-off-by: Sachin Kale <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/OpenSearch that referenced this pull request Sep 20, 2023
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
vikasvb90 pushed a commit to vikasvb90/OpenSearch that referenced this pull request Oct 10, 2023
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.10 Backport to 2.10 branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants