storage: source storage timeout #276

pro-wh · 2023-01-07T00:36:09Z

adding a timeout to 'routine' grpc requests

https://grpc.io/blog/deadlines/#go grpc recommends using the usual go context timeout/deadline system
Emerald analyzer stops silently #267 reports that requests freeze (thousands of minutes)
selected timeout in this PR is 61 seconds, a little longer than the usual minute-long timeout used in common proxies
only grpc requests are subject to the timeout. other analysis stuff including writing to the db are still able to hang \: feel free to open an alternate PR that puts an entire loop iteration under a timeout
not exposed in configuration 😵‍💫
'routine' grpc requests only, exempting methods that get entire genesis documents. consensus AllData gives each subcomponent a timeout

mitjat

Thank you! While this works, I dislike having the same setup repeated in many places, and having to remember to add it for any future RPCs.
Would you be opposed to setting a time-constrained context a level higher, i.e. time-constraining the whole processRound() or processBlock()? In 267, you mention that you're worried about how that would interact with db queries. Any other concerns? IMO it might actually eb nice to have a 61s timeout for that too, as a failsafe (because I don't know for sure that pgx handles a dropped connection robustly, for example; it should, though).

But regardless, we can send a time-constrained context to just the data-fetching (and possibly data-processing) parts of each round, in a single place:

Emerald: derive ctxWithTimeout from ctx and pass it into errgroup creation here. We can cancel the context after group.Wait(). The DB stuff happens outside the errgroup, so it will not be impacted by groupCtx (which derives from ctxWithTimeout).
Incidentally, looks like we have a small bug and prepareBlockData() here should be using groupCtx, not ctx.
Consensus: There's no async processing so it's even easier, we can just pass a ctxWithTimeout to the AllData() call here and cancel the context right after the call returns.

storage/oasis/runtime.go

pro-wh · 2023-01-19T22:46:03Z

ok pulled up to 'main' analyzer level

analyzer/consensus/consensus.go

mitjat

Thank you!

pro-wh requested review from aefhm, Andrew7234 and mitjat as code owners January 7, 2023 00:36

pro-wh force-pushed the pro-wh/feature/timeout branch from d0728f3 to 993da0b Compare January 7, 2023 00:36

mitjat reviewed Jan 8, 2023

View reviewed changes

storage/oasis/runtime.go Outdated Show resolved Hide resolved

pro-wh force-pushed the pro-wh/feature/timeout branch from 993da0b to 5ab2f37 Compare January 19, 2023 22:45

pro-wh requested a review from mitjat January 19, 2023 22:46

pro-wh commented Jan 19, 2023

View reviewed changes

analyzer/consensus/consensus.go Outdated Show resolved Hide resolved

analyzer: add per-block/round timeout

296fe20

pro-wh force-pushed the pro-wh/feature/timeout branch from 5ab2f37 to 296fe20 Compare January 19, 2023 22:47

mitjat approved these changes Jan 19, 2023

View reviewed changes

pro-wh merged commit 7785219 into main Jan 19, 2023

pro-wh deleted the pro-wh/feature/timeout branch January 19, 2023 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: source storage timeout #276

storage: source storage timeout #276

pro-wh commented Jan 7, 2023

mitjat left a comment

pro-wh commented Jan 19, 2023

mitjat left a comment

storage: source storage timeout #276

storage: source storage timeout #276

Conversation

pro-wh commented Jan 7, 2023

mitjat left a comment

Choose a reason for hiding this comment

pro-wh commented Jan 19, 2023

mitjat left a comment

Choose a reason for hiding this comment