Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make stream wait timeout a first class citizen #1473

Merged
merged 9 commits into from
May 11, 2023

Conversation

mutianf
Copy link
Contributor

@mutianf mutianf commented Mar 9, 2023

Port change googleapis/gax-java#1409 because gax is going to have a major version bump.
The original PR was closed because remapping rpcTimeout is too risky. But now gax is having a major version bump soon we're hoping to fix this behavior.

From the original comment:

For a server streaming api, there are 4 conceptual timeouts:

  1. overall operation timeout - the maximum amount time that passes from the user invoking a method until that method exits
  2. attempt/rpc timeout - if retries are enabled the maximum amount of time that passes for each attempt in an operation
  3. message wait timeout - the maximum amount of time to wait for the next message from the server
  4. idle timeout - how long to wait before considering the stream orphaned by the user and closing it

Each has a usecase:

  1. operation timeout is useful for users to fulfill their own slo guarantees
  2. attempt timeout are useful when a client developer knows the absolute limit of an rpc but that limit happens to be shorter than the required slo for a customer. For example a point read of a bigtable key shouldnt ever take more than 100ms. If it does, then we can assume something is wrong (GFE died without sending a FIN packet) and abort the request and retry.
  3. message wait timeouts have a similar use as attempt timeouts with tighter guarantees
  4. idle timeouts are useful to reduce buffer bloat on the server
    Currently all of them are implemented in gax but the delineation was muddied by me a while back. This PR tries to fix the situation. In the current world, operation timeout is defined by RetrySettings#totalTimeout, idle timeout is defined by ServerStreamingCallSettings#idleTimeout. However RetrySettings#rpcTimeout is mapped to message wait timeout and attempt timeout is only configureable per call using ApiCallContext#withTimeout.

This PR cleans up the situation by mapping RetrySettings#rpcTimeout to attempt timeouts and exposes a new setting for wait timeout on ServerStreamingCallSettings.

@mutianf mutianf requested a review from a team as a code owner March 9, 2023 22:18
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Mar 9, 2023
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 9, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

62.5% 62.5% Coverage
0.0% 0.0% Duplication

@suztomo
Copy link
Member

suztomo commented Mar 10, 2023

@mutianf We planned a major release but it turned out that we can implement many of the ideas without introducing breaking changes. So GAX is not going to have a major release.

However, I think your idea here is worth discussing. Would you add it into the wishlist as a suggested edit? https://docs.google.com/document/d/1oEzsIEr7DFoyG6-QXXZyIyjIZkbe8i2obLOG90RnVKk/edit?resourcekey=0-T1uuNtaLJBuOtIlcRfe2iQ (internal doc)

@mutianf
Copy link
Contributor Author

mutianf commented Mar 10, 2023

Thanks @suztomo, I added to the bottom of the list. Feel free to add me to meetings that discuss these issues!

@lqiu96
Copy link
Contributor

lqiu96 commented May 4, 2023

I think the changes make sense. Just left a few questions (trying to get more familiar with the streaming retry logic).

Copy link
Collaborator

@blakeli0 blakeli0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mattie for opening this PR! It would be great if we can add a showcase integration test to visualize how customers would use it, and maybe test the behavior of streamWaitTimeout as well. We already have a basic showcase integration tests for server side streaming.
See the README of showcase module for more details. Ley me know if you have any additional questions regarding showcase integration tests, @burkedavison /@lqiu96 / @mpeddada1 can provide more info as well.

@mutianf
Copy link
Contributor Author

mutianf commented May 8, 2023

Thanks for reviewing! I'll add showcase integration test and address other comments later.

@sonarqubecloud
Copy link

[gapic-generator-java-root] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

87.5% 87.5% Coverage
0.0% 0.0% Duplication

@sonarqubecloud
Copy link

[java_showcase_integration_tests] SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

68.8% 68.8% Coverage
0.0% 0.0% Duplication

Copy link
Contributor

@igorbernstein2 igorbernstein2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! thanks for fixing this! and I apologize for the original misfeature

@sonarqubecloud
Copy link

[java_showcase_unit_tests] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@blakeli0 blakeli0 added the owlbot:run Add this label to trigger the Owlbot post processor. label May 11, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 11, 2023
@blakeli0 blakeli0 merged commit bc8a4ad into googleapis:main May 11, 2023
@mutianf mutianf deleted the timeout branch November 27, 2023 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants