-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify remote directory cleanup after snapshot delete to … #12672
Simplify remote directory cleanup after snapshot delete to … #12672
Conversation
f5d5d78
to
d0ae7d8
Compare
❌ Gradle check result for f5d5d78: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Compatibility status:Checks if related components are compatible with change 5d7ab9f Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer.git] |
server/src/main/java/org/opensearch/repositories/blobstore/RemoteStoreShardCleanupTask.java
Show resolved
Hide resolved
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #12672 +/- ##
============================================
- Coverage 71.42% 71.40% -0.02%
- Complexity 59978 60051 +73
============================================
Files 4985 4989 +4
Lines 282275 282569 +294
Branches 40946 40982 +36
============================================
+ Hits 201603 201775 +172
- Misses 63999 64070 +71
- Partials 16673 16724 +51 ☔ View full report in Codecov by Sentry. |
❌ Gradle check result for 787db62: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Gradle Check is failing due to falky tests unrelated to this change mentioned here: |
❌ Gradle check result for 787db62: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
failling test |
❌ Gradle check result for 787db62: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…urrent cleanup task runs for same shard. Signed-off-by: Harish Bhakuni <[email protected]>
Signed-off-by: Harish Bhakuni <[email protected]>
787db62
to
5d7ab9f
Compare
* Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard. Signed-off-by: Harish Bhakuni <[email protected]> * Address PR Comments. Signed-off-by: Harish Bhakuni <[email protected]> --------- Signed-off-by: Harish Bhakuni <[email protected]> Co-authored-by: Harish Bhakuni <[email protected]> (cherry picked from commit d0467b3) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…#12682) * Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard. * Address PR Comments. --------- (cherry picked from commit d0467b3) Signed-off-by: Harish Bhakuni <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Harish Bhakuni <[email protected]>
…rch-project#12672) * Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard. Signed-off-by: Harish Bhakuni <[email protected]> * Address PR Comments. Signed-off-by: Harish Bhakuni <[email protected]> --------- Signed-off-by: Harish Bhakuni <[email protected]> Co-authored-by: Harish Bhakuni <[email protected]>
…rch-project#12672) * Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard. Signed-off-by: Harish Bhakuni <[email protected]> * Address PR Comments. Signed-off-by: Harish Bhakuni <[email protected]> --------- Signed-off-by: Harish Bhakuni <[email protected]> Co-authored-by: Harish Bhakuni <[email protected]>
…rch-project#12672) * Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard. Signed-off-by: Harish Bhakuni <[email protected]> * Address PR Comments. Signed-off-by: Harish Bhakuni <[email protected]> --------- Signed-off-by: Harish Bhakuni <[email protected]> Co-authored-by: Harish Bhakuni <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
…avoid concurrent cleanup task runs for same shard.
Description
During snapshot deletion, we need to trigger remote directory cleanup after every release lock operation for indices which are already deleted and removed from cluster state. We trigger cleanup task after every release lock operation.
if there are multiple shard blobs of same shard we will trigger multiple async cleanup task for same shard directory. since remote store directory cleanup logic is not thread safe. we might end up into situations where both cleanup tasks were not able to trigger cleanup successfully.
So as part of this PR, if there is already an ongoing cleanup task for a shard, and another thread also tries to trigger the cleanup, we will just skip the next task. This might end up into some zombie data in remote store, but since the cleanup itself happens asynchronously if the cleanup fails in between it will also end up into zombie data in remote store which is needed to be handled separately.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
Commit changes are listed out in CHANGELOG.md file (See: Changelog)Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.