Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read the same medata file that is locked during restore of shallow snapshot #10979

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented Oct 29, 2023

Description

  • Currently, shallow copy snapshot (which references data in remote store) takes a lock on remote store metadata file based on primary term and generation.
  • But as we upload a metadata file each refresh, there are multiple files that get created for the same primary term and generation.
  • Following is the example that creates issues in the snapshot flow:
    • Snapshot triggers flush, generation is increased to 4 and snapshot has taken lock on primary term and generation 4. Assume metadata file name was md_1_4_1
    • As indexing proceeded, multiple metadata files are uploaded for the same primary term and generation. Let's say last file uploaded was md_1_4_24.
    • While restoring snapshot, we only provide primary term and generation and ask to get the metadata file where the latest metadata file is fetched (as we fetch the files in lexicographic order).
    • Snapshot restore works fine most of the times even with the latest metadata file as it restores only from the segments_N file which remains constant for the given generation.
    • But the problem occurs when the segments that are referred by the segments_N file gets merged and not part of the latest metadata file anymore.
  • To mitigate this issue, in this PR, we make sure to read the same file that was locked instead of just relying on primary term and generation.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 29, 2023

Compatibility status:

Checks if related components are compatible with change 424c801

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Oct 29, 2023

Codecov Report

Merging #10979 (424c801) into main (73bbeb5) will decrease coverage by 0.03%.
Report is 1 commits behind head on main.
The diff coverage is 7.14%.

@@             Coverage Diff              @@
##               main   #10979      +/-   ##
============================================
- Coverage     71.27%   71.25%   -0.03%     
- Complexity    58724    58736      +12     
============================================
  Files          4870     4870              
  Lines        276578   276589      +11     
  Branches      40202    40204       +2     
============================================
- Hits         197137   197083      -54     
- Misses        62986    63087     +101     
+ Partials      16455    16419      -36     
Files Coverage Δ
...in/java/org/opensearch/index/shard/IndexShard.java 69.66% <100.00%> (-0.48%) ⬇️
...java/org/opensearch/index/shard/StoreRecovery.java 54.93% <0.00%> (-0.28%) ⬇️
...earch/index/store/RemoteSegmentStoreDirectory.java 88.17% <0.00%> (-0.68%) ⬇️
...re/lockmanager/RemoteStoreMetadataLockManager.java 35.00% <0.00%> (-10.17%) ⬇️

... and 473 files with indirect coverage changes

@sachinpkale
Copy link
Member Author

sachinpkale commented Oct 30, 2023

Added integ test that fails without the change in this PR.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled

@gbbafna gbbafna merged commit 4efa6d7 into opensearch-project:main Oct 30, 2023
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Oct 30, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 30, 2023
…apshot (#10979)

Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 4efa6d7)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants