Implement configurable timeout for RPC operations #2171

hiddeco · 2019-06-19T20:35:19Z

This assures that in a scenario where for example the Kubernetes API
is temporary rate limited, requests that are taking too long get
abandoned, rather than piling up.

As we have observed excessive amount of goroutines getting spawned
in a short amount of time, sometimes resulting in a shutdown of the
Flux daemon due to the spike in CPU (and RAM) that comes with it.

squaremo · 2019-06-20T10:30:33Z

Adding a context to the RPCs is 💯

I think this will need to go further though. The problem being addressed (in large part) is that Kubernetes API requests pile up, once they start needing to wait for rate limiting. To be able to abandon those when a request times out, the context must be propagated to those calls (i.e., to the cluster.Cluster methods, then to the Kubernetes client methods).

Sadly, the client-go API doesn't support contexts. What I suggest is, for the meantime, is that a context is passed through, and checked before each Kubernetes API call (or anything that occurs in a loop, at least).

cluster/kubernetes/kubernetes.go

squaremo

I've suggested one tweak you could make. But I think it'll work as it is. Thanks for making this detour :-)

cluster/kubernetes/kubernetes.go

This assures that in a scenario where for example the Kubernetes API is temporary rate limited, requests that are taking too long get abandoned, rather than piling up. As we have observed excessive amount of goroutines getting spawned in a short amount of time, sometimes resulting in a shutdown of the Flux daemon due to the spike in CPU (and RAM) that comes with it.

Implement configurable timeout for RPC operations

hiddeco requested a review from squaremo June 19, 2019 20:35

squaremo reviewed Jun 20, 2019

View reviewed changes

cluster/kubernetes/kubernetes.go Show resolved Hide resolved

hiddeco force-pushed the enhancement/rpc-timeout branch from 6d4f77c to f2402e2 Compare June 20, 2019 17:12

squaremo approved these changes Jun 24, 2019

View reviewed changes

cluster/kubernetes/kubernetes.go Show resolved Hide resolved

hiddeco added 2 commits June 24, 2019 19:21

Pass along context to methods making API calls

bfdfe31

hiddeco force-pushed the enhancement/rpc-timeout branch from f2402e2 to bfdfe31 Compare June 24, 2019 17:21

hiddeco added this to the 1.14.0 milestone Jun 25, 2019

hiddeco merged commit 5555c67 into master Jun 25, 2019

hiddeco deleted the enhancement/rpc-timeout branch June 25, 2019 08:50

hiddeco modified the milestones: 1.14.0, 1.13.1 Jun 25, 2019

squaremo pushed a commit that referenced this pull request Jun 27, 2019

Implement configurable timeout for RPC operations (#2171)

3bf6d87

Implement configurable timeout for RPC operations

hiddeco mentioned this pull request Jul 16, 2019

Flux agent goroutine count and memory usage grows to problematic levels. #2263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement configurable timeout for RPC operations #2171

Implement configurable timeout for RPC operations #2171

hiddeco commented Jun 19, 2019

squaremo commented Jun 20, 2019

squaremo left a comment

Implement configurable timeout for RPC operations #2171

Implement configurable timeout for RPC operations #2171

Conversation

hiddeco commented Jun 19, 2019

squaremo commented Jun 20, 2019

squaremo left a comment

Choose a reason for hiding this comment