Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integ test: Verify that replication is paused when remote index is deleted #74

Merged
merged 1 commit into from
Aug 2, 2021

Conversation

soosinha
Copy link
Member

Description

Adding integ test to check that replication gets paused if remote cluster is unavailable.
IndexNotFoundException is being excluded from retry logic as this is a permanent failure otherwise we would need to wait for 2-3 min(due to exponential backoff) for replication to get paused.

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@naveenpajjuri
Copy link
Contributor

LGTM

naveenpajjuri
naveenpajjuri previously approved these changes Aug 2, 2021
Comment on lines 111 to 119
try {
return suspendExecute(replicationMetadata, action, req, defaultContext = defaultContext)
} catch (e: ElasticsearchException) {
if (retryOn.contains(e.javaClass) || TransportActions.isShardNotAvailableException(e)) {
// Not retrying for IndexNotFoundException as it is not a transient failure
if (e !is IndexNotFoundException && (retryOn.contains(e.javaClass) || TransportActions.isShardNotAvailableException(e))) {
log.warn("Encountered a failure while executing in $req. Retrying in ${currentBackoff/1000} seconds" +
".", e)
delay(currentBackoff)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is a generic utility method, it feels incorrect to filter out exceptions here.

Shouldn't we be fixing the caller to not pass IndexNotFoundException here?

(We should also fix that isShartNotAvailableException, but one step at a time)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want to retry in case of IndexNotFoundException for all code paths to this method.
IndexNotFoundException is not passed by the caller. It is one of exceptions in TransportActions.isShardNotAvailableException which is part of core ES.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue and added a TODO

@soosinha soosinha force-pushed the t-delete-idx-pause branch from 5c36b8b to bb129cd Compare August 2, 2021 12:05
@soosinha soosinha changed the title Integ test: Verify that replication is paused when remote is unavailable Integ test: Verify that replication is paused when remote index is deleted Aug 2, 2021
@soosinha soosinha merged commit 61cd7ba into opensearch-project:main Aug 2, 2021
@soosinha soosinha deleted the t-delete-idx-pause branch August 2, 2021 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants