Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Implement configurable timeout for RPC operations #2171

Merged
merged 2 commits into from
Jun 25, 2019

Conversation

hiddeco
Copy link
Member

@hiddeco hiddeco commented Jun 19, 2019

This assures that in a scenario where for example the Kubernetes API
is temporary rate limited, requests that are taking too long get
abandoned, rather than piling up.

As we have observed excessive amount of goroutines getting spawned
in a short amount of time, sometimes resulting in a shutdown of the
Flux daemon due to the spike in CPU (and RAM) that comes with it.

@hiddeco hiddeco requested a review from squaremo June 19, 2019 20:35
@squaremo
Copy link
Member

Adding a context to the RPCs is 💯

I think this will need to go further though. The problem being addressed (in large part) is that Kubernetes API requests pile up, once they start needing to wait for rate limiting. To be able to abandon those when a request times out, the context must be propagated to those calls (i.e., to the cluster.Cluster methods, then to the Kubernetes client methods).

Sadly, the client-go API doesn't support contexts. What I suggest is, for the meantime, is that a context is passed through, and checked before each Kubernetes API call (or anything that occurs in a loop, at least).

@hiddeco hiddeco force-pushed the enhancement/rpc-timeout branch from 6d4f77c to f2402e2 Compare June 20, 2019 17:12
Copy link
Member

@squaremo squaremo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested one tweak you could make. But I think it'll work as it is. Thanks for making this detour :-)

cluster/kubernetes/kubernetes.go Show resolved Hide resolved
hiddeco added 2 commits June 24, 2019 19:21
This assures that in a scenario where for example the Kubernetes API
is temporary rate limited, requests that are taking too long get
abandoned, rather than piling up.

As we have observed excessive amount of goroutines getting spawned
in a short amount of time, sometimes resulting in a shutdown of the
Flux daemon due to the spike in CPU (and RAM) that comes with it.
@hiddeco hiddeco force-pushed the enhancement/rpc-timeout branch from f2402e2 to bfdfe31 Compare June 24, 2019 17:21
@hiddeco hiddeco added this to the 1.14.0 milestone Jun 25, 2019
@hiddeco hiddeco merged commit 5555c67 into master Jun 25, 2019
@hiddeco hiddeco deleted the enhancement/rpc-timeout branch June 25, 2019 08:50
@hiddeco hiddeco modified the milestones: 1.14.0, 1.13.1 Jun 25, 2019
squaremo pushed a commit that referenced this pull request Jun 27, 2019
Implement configurable timeout for RPC operations
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants