Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async retries timeout race condition #528

Closed
daniel-sanche opened this issue Sep 7, 2023 · 1 comment
Closed

async retries timeout race condition #528

daniel-sanche opened this issue Sep 7, 2023 · 1 comment
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@daniel-sanche
Copy link
Contributor

daniel-sanche commented Sep 7, 2023

Problem

Async retries use an asyncio.wait_for line to break out early when the deadline is hit. The problem is, in practice, retries typically wrap gapic grpc calls, which implement their own timeout logic. This means there is a natural race condition when the timeout is reached, where one of two things can happen:

  • the wait_for line in the wrapper notices the timeout first:
    • the wrapper emitts a RetryError
    • a CancelledError is sent down to the wrapped function, which the wrapped function may have to account for properly
  • The wrapped grpc call times out on its own:
    • a DeadlineExceeded error is passed to the wrapper
      • This may result in the wrapper raising a RetryError, if DeadlineExceeded is in the list of retryable errors, otherwise it will be raised directly through the wrapper.

On top of the race condition, there are a couple other issues with wait_for:

  • wait_for creates a new task, which can add significant performance overhead
  • This behaviour differs from the sync retry function, which only checks for timeouts after the wrapped function attempt is complete

Solutions:

We can add a new interrupt_on_timeout argument to the async retry function, and only enable wait_for when the flag is True. To avoid breaking changes, we can have the flag default to True

But personally, I think we should consider just removing the wait_for entirely


Streaming Retries:

Streaming retries add an extra complication, since there's the added possibility to check for expired timeouts after each yield in the stream, on top of the option to break the stream mid-yield. We we could add another flag to control for to control these timeout options.

But adding any extra checks in between each yield can significantly harm performance, especially if we add a wait_for to interrupt mid-yield, like async_retry currently does. For this reason. I suggest that we keep these timeout options out of streaming retries for now, and only keep the async_retries wait_for for compatibility reasons.

In general, I think the retry wrappers should expect the wrapped function to handle timeout intrruptions on their own, and the wrappers should only handle timeouts between attempts

@daniel-sanche daniel-sanche added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Sep 7, 2023
@daniel-sanche daniel-sanche self-assigned this Sep 7, 2023
@daniel-sanche daniel-sanche added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p3 Desirable enhancement or fix. May not be included in next release. labels Sep 7, 2023
@daniel-sanche
Copy link
Contributor Author

Fixed in #495

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

1 participant