Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: source storage timeout #276

Merged
merged 1 commit into from
Jan 19, 2023
Merged

storage: source storage timeout #276

merged 1 commit into from
Jan 19, 2023

Conversation

pro-wh
Copy link
Collaborator

@pro-wh pro-wh commented Jan 7, 2023

adding a timeout to 'routine' grpc requests

  • https://grpc.io/blog/deadlines/#go grpc recommends using the usual go context timeout/deadline system
  • Emerald analyzer stops silently #267 reports that requests freeze (thousands of minutes)
  • selected timeout in this PR is 61 seconds, a little longer than the usual minute-long timeout used in common proxies
  • only grpc requests are subject to the timeout. other analysis stuff including writing to the db are still able to hang \: feel free to open an alternate PR that puts an entire loop iteration under a timeout
  • not exposed in configuration 😵‍💫
  • 'routine' grpc requests only, exempting methods that get entire genesis documents. consensus AllData gives each subcomponent a timeout

@pro-wh pro-wh force-pushed the pro-wh/feature/timeout branch from d0728f3 to 993da0b Compare January 7, 2023 00:36
Copy link
Contributor

@mitjat mitjat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! While this works, I dislike having the same setup repeated in many places, and having to remember to add it for any future RPCs.
Would you be opposed to setting a time-constrained context a level higher, i.e. time-constraining the whole processRound() or processBlock()? In 267, you mention that you're worried about how that would interact with db queries. Any other concerns? IMO it might actually eb nice to have a 61s timeout for that too, as a failsafe (because I don't know for sure that pgx handles a dropped connection robustly, for example; it should, though).

But regardless, we can send a time-constrained context to just the data-fetching (and possibly data-processing) parts of each round, in a single place:

  • Emerald: derive ctxWithTimeout from ctx and pass it into errgroup creation here. We can cancel the context after group.Wait(). The DB stuff happens outside the errgroup, so it will not be impacted by groupCtx (which derives from ctxWithTimeout).
  • Incidentally, looks like we have a small bug and prepareBlockData() here should be using groupCtx, not ctx.
  • Consensus: There's no async processing so it's even easier, we can just pass a ctxWithTimeout to the AllData() call here and cancel the context right after the call returns.

storage/oasis/runtime.go Outdated Show resolved Hide resolved
@pro-wh pro-wh force-pushed the pro-wh/feature/timeout branch from 993da0b to 5ab2f37 Compare January 19, 2023 22:45
@pro-wh
Copy link
Collaborator Author

pro-wh commented Jan 19, 2023

ok pulled up to 'main' analyzer level

@pro-wh pro-wh requested a review from mitjat January 19, 2023 22:46
@pro-wh pro-wh force-pushed the pro-wh/feature/timeout branch from 5ab2f37 to 296fe20 Compare January 19, 2023 22:47
Copy link
Contributor

@mitjat mitjat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@pro-wh pro-wh merged commit 7785219 into main Jan 19, 2023
@pro-wh pro-wh deleted the pro-wh/feature/timeout branch January 19, 2023 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants