Propagate shard failures to index task and auto-pause on failures. #56
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This commit introduces FailedState for ShardTask. Any failure in
shard-replication task (even via nested coroutine) are caught and captured as
FailedState.
The IndexReplicationTask notices the failure while it is in MonitoringState and
triggers a pause. If the pause fails, it marks the overall replication state as
failed.
This also required fixing the nesting of coroutines. The outer coroutine
was being used to trigger actor and waiting for cancellation, which was
breaking intuitive parent-child coroutine relationship via nesting.
Signed-off-by: Gopala Krishna Ambareesh [email protected]
Testing
Since there is no failure injection framework yet, resorted to manual testing. Ran the following manual tests
Example output
Paused due to failure
Failure to pause on failure
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.