-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integ test: Verify that replication is paused when remote index is deleted #74
Conversation
LGTM |
try { | ||
return suspendExecute(replicationMetadata, action, req, defaultContext = defaultContext) | ||
} catch (e: ElasticsearchException) { | ||
if (retryOn.contains(e.javaClass) || TransportActions.isShardNotAvailableException(e)) { | ||
// Not retrying for IndexNotFoundException as it is not a transient failure | ||
if (e !is IndexNotFoundException && (retryOn.contains(e.javaClass) || TransportActions.isShardNotAvailableException(e))) { | ||
log.warn("Encountered a failure while executing in $req. Retrying in ${currentBackoff/1000} seconds" + | ||
".", e) | ||
delay(currentBackoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is a generic utility method, it feels incorrect to filter out exceptions here.
Shouldn't we be fixing the caller to not pass IndexNotFoundException here?
(We should also fix that isShartNotAvailableException, but one step at a time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not want to retry in case of IndexNotFoundException for all code paths to this method.
IndexNotFoundException is not passed by the caller. It is one of exceptions in TransportActions.isShardNotAvailableException
which is part of core ES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue and added a TODO
…leted Signed-off-by: Sooraj Sinha <[email protected]>
5c36b8b
to
bb129cd
Compare
Description
Adding integ test to check that replication gets paused if remote cluster is unavailable.
IndexNotFoundException
is being excluded from retry logic as this is a permanent failure otherwise we would need to wait for 2-3 min(due to exponential backoff) for replication to get paused.Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.