-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integ test : Stop replication when remote cluster is unreachable #69
Integ test : Stop replication when remote cluster is unreachable #69
Conversation
} catch(e: RetentionLeaseNotFoundException) { | ||
// log error and bail | ||
log.error("${e.message}") | ||
} catch (e: Exception) { | ||
log.error("Exception in removing retention lease", e) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename the "removeRetentionLease" method to also indicate that it doesn't propagate exception?
For example - either removeRetentionLeaseNoThrow
or attemptRetentionLeaseRemoval
etc just so we are clear that it is only best attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud : how would a cluster admin know that this has happened ? I see that logs are the only current way of this . Concerned as leaking retention leases on leader side can have side effects like no truncation of translog .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some message in the stop replication response to convey this?
The other option I can think of is let the stop replication fail by default but allow the customer to pass a flag to indicate that remote call should not be made in the stop replication call in which case the stop replication should complete without removing the retention lease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, this method should not throw exception as stop replication should be successful even if remote is unreachable. But we can build on this in the future to have an async mechanism to remove the retention lease.
Renamed the method as suggested for now.
5767129
to
e72aee2
Compare
Signed-off-by: Sooraj Sinha <[email protected]>
e72aee2
to
9530b38
Compare
Signed-off-by: Sooraj Sinha [email protected]
Description
Stop replication should work even when remote cluster is unavailable. Currently, stop replication errors out if remote cluster is unavailable because remove retention lease method throws an exception.
This change catches the error error thrown by remove retention lease method so the stop replication API can go through. Also, added an integ test to verify the same
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.