[Proposal] Switch to use informers to handle k8s resource state/status checking #4467

terrytangyuan · 2020-11-05T14:54:15Z

Summary

Switch to use informers to handle k8s resource state/status checking.

Use Cases

Currently the executor calls kubectl get with a fixed interval to check the status/state of the k8s resource that's created (related code). This will add a lot of burden on the cluster when there are a lot of steps defined in the form of k8s manifests.

We should consider switching to use informers to check status of the created resources from local cache instead of querying the status from remote server on a schedule.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

alexec · 2020-11-05T17:17:01Z

Would you like to submit a PR to do this?

sarabala1979 · 2020-11-05T19:39:33Z

Instead of Informer, I would recommend using listwatcher to avoid memory usage.

terrytangyuan · 2020-11-06T21:28:33Z

@sarabala1979 Thanks for the suggestion. My understanding is that informer leverages list-watch mechanism so that we will only query from the server for the first time list() is called and then we will only query from local cache after that. Could you clarify what you meant?

@alexec Yes, I can look into this when I get a chance and will send a WIP PR if I ever start working on it.

terrytangyuan · 2020-12-09T19:45:53Z

See discussions in #4669 (comment). This should not be an issue anymore.

terrytangyuan · 2021-02-08T14:33:03Z

@jessesuen @alexec @sarabala1979 Do you know if there's a reason for writing the creation and status checking for resource type template using kubectl? We have the following two issues when using this at scale:

We'd like to stay away from long-running connections via kubectl get -w for various reasons, e.g. better control of status checking calls. We observe constant connection drops for some of our clusters and it had to re-establish the watch so the -w here doesn't help much in this case.
We'd like to track the traffic from the resource checking in executor using identifier like user agent like in feat(executor): Add user agent to workflow executor #5014. However, the user agent using kubectl is not configurable and hard-coded to be something like kubectl/v1.18.8 (linux/amd64) kubernetes/9f2892a, which is mixed with other uses of kubectl from other applications in the cluster.

Here's what I am proposing to help address the above issues:

Rewrite the status checking part using k8s Go client instead of kubectl, which allows us to configure the user agent, resource version when querying status, the status checking interval, etc. We have something working internally and can be migrated to the open source version if the approach is agreed.

Any feedback and suggestions would be appreciated.

alexec · 2021-03-10T19:55:39Z

This is a yes from me. I think we should build a new template called resource2 and have the agent execute it.

jessesuen · 2021-03-10T20:06:06Z

We'd like to stay away from long-running connections via kubectl get -w for various reasons

So are you suggesting we poll instead of watch? I think that would be reasonable and less taxing on API server. I guess the next question is how will users get the ability to control this interval

The choice of using kubectl was a convenience. The resource template is 3+ years old and at the time:

dynamic client did not exist
kubectl logic was not consumable as a library

Now that they do, I agree could do this without kubectl.

We'd like to track the traffic from the resource checking in executor using identifier like user agent

Does user agent not already include the service account? That's unfortunate if it doesn't.

I think we should build a new template called resource2 and have the agent execute it.

While this will create less pods, I don't think this will save on API server load. In any case, if we do decide to do it in an agent, I would still prefer to avoid a new template type and reuse existing resource template

alexec · 2021-03-10T20:17:48Z

would still prefer to avoid a new template type and reuse existing resource template

How about updating the current resource implementation to use a dynamic informer then? I anticipate that we would be able to reuse that code in the agent.

terrytangyuan · 2021-03-10T20:23:22Z

Thank you! I think we can reuse the existing resource template. I am not sure if we want to switch to informer yet (though that was originally proposed in this issue) as that would require additional resources on executor pod whereas currently the requirements can be customized to be minimal.

Just to clarify - the main changes of my proposal would be:

Switch to use k8s SDK instead of kubectl so it's more configurable and friendly to monitoring tools.
Switch to poll instead of watch so we have more control over cases where API server is unstable. We could allow configurable via an environment variable.

terrytangyuan added the type/feature Feature request label Nov 5, 2020

alexec added the epic/scaling label Nov 5, 2020

alexec added the help wanted label Nov 5, 2020

terrytangyuan mentioned this issue Dec 8, 2020

feat(executor): Allow to configure the interval of resource state checking #4669

Closed

1 task

terrytangyuan closed this as completed Dec 9, 2020

terrytangyuan reopened this Feb 8, 2021

terrytangyuan mentioned this issue Feb 9, 2021

feat(controller) Emissary executor. #4925

Merged

terrytangyuan mentioned this issue Mar 11, 2021

feat(executor): Switch to use SDK and poll-based resource status checking #5364

Merged

1 task

terrytangyuan closed this as completed Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Switch to use informers to handle k8s resource state/status checking #4467

[Proposal] Switch to use informers to handle k8s resource state/status checking #4467

terrytangyuan commented Nov 5, 2020

alexec commented Nov 5, 2020

sarabala1979 commented Nov 5, 2020

terrytangyuan commented Nov 6, 2020 •

edited

Loading

terrytangyuan commented Dec 9, 2020

terrytangyuan commented Feb 8, 2021 •

edited

Loading

alexec commented Mar 10, 2021

jessesuen commented Mar 10, 2021

alexec commented Mar 10, 2021

terrytangyuan commented Mar 10, 2021 •

edited

Loading

[Proposal] Switch to use informers to handle k8s resource state/status checking #4467

[Proposal] Switch to use informers to handle k8s resource state/status checking #4467

Comments

terrytangyuan commented Nov 5, 2020

Summary

Use Cases

alexec commented Nov 5, 2020

sarabala1979 commented Nov 5, 2020

terrytangyuan commented Nov 6, 2020 • edited Loading

terrytangyuan commented Dec 9, 2020

terrytangyuan commented Feb 8, 2021 • edited Loading

alexec commented Mar 10, 2021

jessesuen commented Mar 10, 2021

alexec commented Mar 10, 2021

terrytangyuan commented Mar 10, 2021 • edited Loading

terrytangyuan commented Nov 6, 2020 •

edited

Loading

terrytangyuan commented Feb 8, 2021 •

edited

Loading

terrytangyuan commented Mar 10, 2021 •

edited

Loading