Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two fixes to work coordination #777

Merged

Conversation

gregschohn
Copy link
Collaborator

@gregschohn gregschohn commented Jun 26, 2024

Description

Make the check for the setup index more reliable and only scan for work that is currently expired.

The HEAD check has now been rolled into the doUntil block for running setup() so that even 400s from the PUT command can be ignored and retried again in a sane manner. The HEAD call should be the authoritative call when it returns 200. The second fix is for searching for items to update. I had been searching for items that weren't even updated yet, causing a lot more pressure to create false positives and inefficiencies all around.

  • Category Bug fix
  • Why these changes are required? Too much contention/no guarantee on convergence for RFS Work coordination.
  • What is the old behavior before changes and new behavior after changes? Lots of thrashing to setup an initial index and a large number of PotentialClockDriftDetectedExceptions bleeding upward.

Issues Resolved

https://opensearch.atlassian.net/browse/MIGRATIONS-1810

Testing

Manual/gradle build

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ore reliable and only scan for work that is currently expired.

The HEAD check has now been rolled into the doUntil block for running setup() so that even 400s from the PUT command can be ignored and retried again in a sane manner.  The HEAD call should be the authoritative call when it returns 200.
The second fix is for searching for items to update.  I had been searching for items that weren't even updated yet, causing a lot more pressure to create false positives and inefficiencies all around.

Signed-off-by: Greg Schohn <[email protected]>
Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 69.21%. Comparing base (4bd3273) to head (8e32803).
Report is 1 commits behind head on main.

Files Patch % Lines
...in/java/com/rfs/cms/OpenSearchWorkCoordinator.java 50.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #777      +/-   ##
============================================
+ Coverage     68.53%   69.21%   +0.68%     
- Complexity     1577     1594      +17     
============================================
  Files           275      275              
  Lines         11660    11660              
  Branches        737      737              
============================================
+ Hits           7991     8071      +80     
+ Misses         3263     3168      -95     
- Partials        406      421      +15     
Flag Coverage Δ
gradle-test 62.15% <50.00%> (+0.95%) ⬆️
python-test 87.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gregschohn gregschohn changed the title Two fixes to work coordination - make the check for the setup index m… Two fixes to work coordination Jun 26, 2024
@gregschohn gregschohn marked this pull request as ready for review June 26, 2024 19:25
@@ -123,7 +123,7 @@ public String toString() {
return "[ statusCode: " + r.getStatusCode() + ", payload: " + payloadStr + "]";
}
},
(response, ignored) -> (response.getStatusCode() / 100) != 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - what was the original logic you were trying to achieve here? I didn't quite follow your explanation in the PR overview.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test determines when we're done needing to retry. When this is 'true', the work is deemed to be done. In other words, we would keep looping until we got something other than a 200. Notice that the beginning of the method was a HEAD that didn't have the same affliction - so in the worst case scenario, we'd burn a process and be resolved the next time that the program ran

@gregschohn gregschohn merged commit 8c323d7 into opensearch-project:main Jun 26, 2024
12 of 13 checks passed
@gregschohn gregschohn deleted the RFSCoordinationImprovements branch July 13, 2024 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants