-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding subshard work items on lease expiry #1185
Conversation
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
…ch-project#1183)" This reverts commit b055711 Signed-off-by: Andre Kurait <[email protected]>
19e015a
to
e669dfa
Compare
Signed-off-by: Andre Kurait <[email protected]>
e669dfa
to
bfce5e1
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1185 +/- ##
============================================
- Coverage 80.93% 80.75% -0.19%
- Complexity 2995 3009 +14
============================================
Files 407 409 +2
Lines 15241 15439 +198
Branches 1021 1034 +13
============================================
+ Hits 12336 12468 +132
- Misses 2277 2341 +64
- Partials 628 630 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
...mSnapshotMigration/src/test/java/org/opensearch/migrations/bulkload/LeaseExpirationTest.java
Outdated
Show resolved
Hide resolved
...mSnapshotMigration/src/test/java/org/opensearch/migrations/bulkload/LeaseExpirationTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Andre Kurait <[email protected]>
...tsFromSnapshotMigration/src/test/java/org/opensearch/migrations/bulkload/SourceTestBase.java
Dismissed
Show dismissed
Hide dismissed
Signed-off-by: Andre Kurait <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for keeping up with this Andre.
DataGenerator/src/main/java/org/opensearch/migrations/data/WorkloadOptions.java
Outdated
Show resolved
Hide resolved
...tsFromSnapshotMigration/src/test/java/org/opensearch/migrations/RfsMigrateDocumentsTest.java
Outdated
Show resolved
Hide resolved
...tsFromSnapshotMigration/src/test/java/org/opensearch/migrations/RfsMigrateDocumentsTest.java
Outdated
Show resolved
Hide resolved
RFS/src/main/java/org/opensearch/migrations/bulkload/common/LuceneDocumentsReader.java
Outdated
Show resolved
Hide resolved
RFS/src/main/java/org/opensearch/migrations/bulkload/common/LuceneDocumentsReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking for a test that starts migration, progress is reported, then we stop RFS (simulating something going wrong), and then start a fresh instance, sure that we restarted from the correct point in the process until completion. Do we have a test that verifies this while reading a legit snapshot? (Work coordinator could be mocked or real IMO)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we stop RFS (simulating something going wrong)
we currently only save progress when a lease expires. What aspect of the system would this be verifying that isn't in LeaseExpirationTest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our most recent customer was running into OOM of the container- in that case we would never have updated the progress. In another case I expect, if customers dialed the rfs work scale to 0 so they paused the migration, we wouldn't want to loose that progress either
How hard would it be to add 60 second check-ins? 🤞 I hope pretty easy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another case I expect, if customers dialed the rfs work scale to 0 so they paused the migration, we wouldn't want to loose that progress either
That work is being tracked in https://opensearch.atlassian.net/issues/MIGRATIONS-2172
10acac7
to
b9b543b
Compare
Signed-off-by: Andre Kurait <[email protected]>
b9b543b
to
6f60ebf
Compare
Continuing in #1198 |
Description
Continuation on #1160
Changes since #1160
Behavior changes is as follows:
The lease time increase logic has changed. Behavior is as follows:
Added E2E test as follows:
Issues Resolved
Testing
Tested in AWS and added new E2E test around the scenario
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.