feat: retry and retry_async support streaming rpcs #495

daniel-sanche · 2023-04-07T00:06:11Z

From #485:

Server streaming libraries return an iterable that can asynchronously yield data from the backend over time. Some of our libraries need to provide a wrapper around the raw stream to do some local processing, before passing the data to the user.

It would be useful to wrap this whole pipeline in a retry decorator, so that if the stream breaks mid-way through, we can recover and continue yielding data through our generator as if nothing happened.

Unfortunately, the current implementation returns the result of the target function directly, so generators will not yield values and exceptions through the retry block

This PR addresses the issue by adding retry_target_generator functions to both the async and sync retry modules, which yield through the target rather than call it directly. Generator mode can be enabled using the is_generator argument on the decorator.

Fixes #485

parthea · 2023-12-07T12:28:48Z

Can the breaking change be avoided by deprecating functionality instead of changing it?

daniel-sanche · 2023-12-07T19:29:06Z

Can the breaking change be avoided by deprecating functionality instead of changing it?

Hmm yeah I guess it could, if we want to keep two versions of the AsyncRetry class hanging around for a while

As I mentioned above though, IMO this doesn't have to be a breaking change. The only breaking part is changing an undocumented behaviour, which would only manifest as a bug when used with our gapic libraries, which have their own timeout logic.

But I'll let you make that call. Let me know which option you prefer

vchudnov-g · 2023-12-08T00:04:14Z

I would avoid having two copies of the Async Retry. As to whether we should flag this as a breaking change.... I want to say it's just a bug fix to the behavior but does not change the surface, so maybe we can call it non-breaking..... Not a 100% confident about this yet.

vchudnov-g · 2023-12-07T23:56:05Z

tests/asyncio/retry/test_retry_unary_async.py

-        assert retry_._deadline == 4
-        assert retry_._on_error is _some_function
+@pytest.mark.asyncio
+async def test_retry_streaming_target_bad_sleep_generator():


I think this is still in the unary_asyn test file.

vchudnov-g · 2023-12-07T23:57:30Z

tests/asyncio/retry/test_retry_unary_async.py


    with pytest.raises(exceptions.RetryError) as exc_info:
-        await retry_async.retry_target(target, predicate, range(10), deadline=10)
+        await retry_async.retry_target(target, predicate, range(10), **timout_kwarg)


Suggested change

await retry_async.retry_target(target, predicate, range(10), **timout_kwarg)

await retry_async.retry_target(target, predicate, range(10), **timeout_kwarg)

vchudnov-g · 2023-12-07T23:57:57Z

tests/asyncio/retry/test_retry_unary_async.py

+
+    timeout_val = 10
+    # support "deadline" as an alias for "timeout"
+    timout_kwarg = (


Suggested change

timout_kwarg = (

timeout_kwarg = (

vchudnov-g

OK, looking at this again to try to settle the breaking change determination. I have two additional code comments in the files.

On the breaking change determination:

this PR makes streaming retries more reliable, so any code users previously had to manually enforce retries on streams will now be needed less or not at all, but it won't be broken by this change. Correct?
The other change os that exceptions are propagated through the retry block, so users may now be seeing exceptions that they did not see before. Correct?

Item 1 by itself would suggest marking this non-breaking, though it may be good to call attention to the change so users can simplify parts of their code that may no longer be needed.
Item 2 seems like a breaking behavioral change that may surprise users.

I think we should flag this as a breaking change, and make it clear at the top of the PR description/merge-commit message that the only expected behavioral changes that may require action on the user's side are the new exceptions they may now receive from failing retries.

Thoughts? @daniel-sanche @parthea

vchudnov-g · 2023-12-08T18:58:04Z

google/api_core/retry/__init__.py

+    "exponential_sleep_generator",
+    "if_exception_type",
+    "if_transient_error",
+    "_build_retry_error",


If we're exposing this, should we remove the leading private underscore?

good point, removed

vchudnov-g · 2023-12-08T19:09:15Z

google/api_core/retry/retry_streaming.py

+        [List[Exception], RetryFailureReason, Optional[float]],
+        Tuple[Exception, Optional[Exception]],
+    ] = _build_retry_error,
+    init_args: _P.args = (),


Why do the the sync and async versions of retry_target_stream have init_args and init_kwargs but the unary retry_targets do not? Shouldn't we be consistent?

I tried to explain that in the comments here

Basically, every time you retry a unary rpc, it will use the same arguments, so it makes sense to lock them in to a partial before calling retry_target

With streaming rpcs, retries are more complicated, because we'd like to be able to recover the stream at the point it broke, instead of from the beginning. But the way to do that is different for each rpc. It's out of scope for this PR, but I can imagine at some point wanting to add a ResumptionStrategy class, and provide some kind of call-back that allows you to modify requests between attempts. For that, we need to keep the method and it's call args separate when starting retry_target_stream

We could change retry_target to work the same way to be consistent, but that would definitely be a breaking change. And it's unnecessary, because unary retries don't have the same problems as streams

TL;DR: Using a partial for retry_target_stream would paint us into a corner, so I'm trying to make things flexible for future improvements

OK, makes sense. I wonder whether we should a brief note to that effect in the documentation for the two parameters ("these two parameters are provided in case the user needs to change the arguments in subsequent retry attempts"). But we can defer that for now.

daniel-sanche · 2023-12-08T20:29:35Z

this PR makes streaming retries more reliable, so any code users previously had to manually enforce retries on streams will now be needed less or not at all, but it won't be broken by this change. Correct?

The other change os that exceptions are propagated through the retry block, so users may now be seeing exceptions that they did not see before. Correct?

I don't believe anything should change for users for either of these points. This PR is adding the new RetryStreaming and RetryStreamingAsync classes to api_core, but nothing will change downstream until we make changes in the downstream gapic_generator to use them.

gapic_generator currently uses the unary retries.Retry object for all rpcs, and that won't change after this PR is merged. Once this functionality is in place, the next step would be looking into bringing it to gapic (but that will actually be a whole separate project on its own, because streaming rpcs will require different semi-custom resumption strategies. I'm not currently planning on taking this on myself, but it ties in with work Leah Cole has been doing in Node)

The main motivation of this change is making the retry streaming functionality available for veneer developers, until we figure out how to automate it for streaming rpcs in gapic libraries

The part that I think could be considered breaking is

AsyncRetry used to use asyncio.wait_for to cancel the attempt at the deadline, but that caused race-condition issues, and wasn't consistent with how the other retries worked. I removed that as part of the refactoring we did to share logic between all retry classes

Our options are:

a. mark this as a breaking change
b. consider it a non-breaking change, since it is an 1) undocuemnt behaviour change to 2) fix a bug 3) that we don't have reason to believe would impact the workflow of many users
c. undo some of the recent refactoring to make async retries work the same as before, and address the race condition (and potential breaking change issues) in a different PR, so we can merge this one now

I'm fine with any option, just let me know which you both prefer

vchudnov-g · 2023-12-11T21:08:45Z

OK, I see.

On point 1, we agree.
On point 2: for unaries, we were propagating the exceptions before, so no change there. And for streaming: these are new retry classes, so they can't break anyone
On point 3: The behavior changes, but the exceptions that are raised out of the new retry code are exceptions like RetryError that the user should already be handling, and the current user code should be ready to handle either type of exception mentioned in the race condition in async retries timeout race condition #528 . In other words, existing user code will not be broken, even if the particular exceptions that happen to be raised under a given set of circumstances would be different with or without the changes in this PR.

If this is an accurate summary, then I think calling this PR non-breaking is fine, which would result in this feat becoming a new minor release. @parthea WDYT? Would you like to do a pre-release for this minor release? I think we don't need to, but I'm totally open to being extra-cautious. FWIW, the existing tests, though refactored, all pass with this new code.

Separately, I'll file an issue to have gapic-generator-python use the new retry streaming classes you implemented.

vchudnov-g · 2023-12-11T21:25:53Z

One point of clarification: the current GAPICs do use AsyncRetry (link)

daniel-sanche · 2023-12-12T20:28:23Z

google/api_core/retry/retry_streaming_async.py

+                                AsyncGenerator["_Y", None], target_iterator
+                            ).athrow(
+                                sys.exception()  # type: ignore
+                            )


it looks like we could just use sys.exc_info()[1] to be compatible with all versions?

sys.exception() looks cleaner, but I don't think it's worth the extra noise for the next few years. (And there's a performance consideration to the version check too, although I don't think athrow should be called often)

Oh, that's a great idea. Let me try that.

OK, thanks. That worked. Merging.

daniel-sanche added 30 commits February 8, 2023 14:32

got retryable generators partially working

953106a

added retrun statement

89aeb75

refactoring

27feb80

work for now deadline

0dffa6d

improved synchronous generator retry

b330c3b

handle closing and returns

67ceaa2

got test to pass

ee2647a

restructured test

5a5396c

added tests

7afa76b

refactoring and comments in retry code

2d91ade

fixed helper; added is_generator flag

0cd384e

got first test working

f72bbec

remove extra await in front of async generator

88eed5c

implemented async generator retry test

91f9cc4

fixed is_generator

c3eb997

added tests for aclose and athrow

f6c6201

simplified close; don't support throws

57b0ee3

added tests

e814ce7

have test that throw should retry

0ffb03f

improved aclose and athrow

a8024f3

added comments

c76f641

close synchronous generator

ee631e3

refactor async file

70eb78c

ran blacken

42ee132

improved send test

102d83b

improved comments

f029dbd

got send working

c83c62a

tested deadline handling

185826c

changed timeout to only count time awaiting or sleeping

c5f7bbe

improved comments

4242036

removed timeout vs deadline explainer from retry_streaming

89abfa4

vchudnov-g reviewed Dec 8, 2023

View reviewed changes

daniel-sanche added 2 commits December 7, 2023 17:00

remove duplicated test

74ab817

fixed variable name

85b3e02

vchudnov-g reviewed Dec 8, 2023

View reviewed changes

daniel-sanche added 2 commits December 8, 2023 11:40

made build_retry_error public

6dbe17d

changed docstring

71e5888

import extra helper in retry_unary_async

cbae3d3

vchudnov-g mentioned this pull request Dec 11, 2023

ensure we use retry classes correctly for {unary,streaming}×{sync,async} methods googleapis/gapic-generator-python#1892

Open

daniel-sanche changed the title ~~feat!: retry and retry_async support streaming rpcs~~ feat: retry and retry_async support streaming rpcs Dec 11, 2023

vchudnov-g added 3 commits December 11, 2023 14:29

Merge branch 'main' into retry_generators

61198b8

fix: address backwards compatibility warnings failing presubmits

acf9752

fix: address mypy errors

7cf9fbf

vchudnov-g force-pushed the retry_generators branch from 31ade5b to 34ec401 Compare December 12, 2023 03:17

fix: address coverage and lint issues failing presubmits

f62439a

vchudnov-g force-pushed the retry_generators branch from 34ec401 to f62439a Compare December 12, 2023 03:43

daniel-sanche commented Dec 12, 2023

View reviewed changes

chore: simplify resolution of backaward-compatibility warnings

b7abeca

vchudnov-g removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Dec 12, 2023

vchudnov-g merged commit 17ff5f1 into main Dec 12, 2023
28 checks passed

vchudnov-g deleted the retry_generators branch December 12, 2023 23:07

release-please bot mentioned this pull request Dec 12, 2023

chore(main): release 2.16.0 #572

Merged

vchudnov-g mentioned this pull request Dec 19, 2023

continue exposing some retry imports for backwards compatibility #575

Closed

potiuk mentioned this pull request Jan 30, 2024

Backward compatible import of retry_async does not work in 2.16.0 #586

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: retry and retry_async support streaming rpcs #495

feat: retry and retry_async support streaming rpcs #495

daniel-sanche commented Apr 7, 2023 •

edited

Loading

parthea commented Dec 7, 2023

daniel-sanche commented Dec 7, 2023

vchudnov-g commented Dec 8, 2023

vchudnov-g Dec 7, 2023

vchudnov-g Dec 7, 2023

vchudnov-g Dec 7, 2023

vchudnov-g left a comment

vchudnov-g Dec 8, 2023

daniel-sanche Dec 8, 2023

vchudnov-g Dec 8, 2023

daniel-sanche Dec 8, 2023

vchudnov-g Dec 11, 2023

daniel-sanche commented Dec 8, 2023

vchudnov-g commented Dec 11, 2023

vchudnov-g commented Dec 11, 2023

daniel-sanche Dec 12, 2023 •

edited

Loading

vchudnov-g Dec 12, 2023

vchudnov-g Dec 12, 2023

	await retry_async.retry_target(target, predicate, range(10), **timout_kwarg)
	await retry_async.retry_target(target, predicate, range(10), **timeout_kwarg)

feat: retry and retry_async support streaming rpcs #495

feat: retry and retry_async support streaming rpcs #495

Conversation

daniel-sanche commented Apr 7, 2023 • edited Loading

parthea commented Dec 7, 2023

daniel-sanche commented Dec 7, 2023

vchudnov-g commented Dec 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vchudnov-g left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-sanche commented Dec 8, 2023

vchudnov-g commented Dec 11, 2023

vchudnov-g commented Dec 11, 2023

daniel-sanche Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-sanche commented Apr 7, 2023 •

edited

Loading

daniel-sanche Dec 12, 2023 •

edited

Loading