Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Switch to use informers to handle k8s resource state/status checking #4467

Closed
terrytangyuan opened this issue Nov 5, 2020 · 9 comments
Labels
type/feature Feature request

Comments

@terrytangyuan
Copy link
Member

Summary

Switch to use informers to handle k8s resource state/status checking.

Use Cases

Currently the executor calls kubectl get with a fixed interval to check the status/state of the k8s resource that's created (related code). This will add a lot of burden on the cluster when there are a lot of steps defined in the form of k8s manifests.

We should consider switching to use informers to check status of the created resources from local cache instead of querying the status from remote server on a schedule.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@terrytangyuan terrytangyuan added the type/feature Feature request label Nov 5, 2020
@alexec
Copy link
Contributor

alexec commented Nov 5, 2020

Would you like to submit a PR to do this?

@sarabala1979
Copy link
Member

Instead of Informer, I would recommend using listwatcher to avoid memory usage.

@terrytangyuan
Copy link
Member Author

terrytangyuan commented Nov 6, 2020

@sarabala1979 Thanks for the suggestion. My understanding is that informer leverages list-watch mechanism so that we will only query from the server for the first time list() is called and then we will only query from local cache after that. Could you clarify what you meant?

@alexec Yes, I can look into this when I get a chance and will send a WIP PR if I ever start working on it.

@terrytangyuan
Copy link
Member Author

See discussions in #4669 (comment). This should not be an issue anymore.

@terrytangyuan
Copy link
Member Author

terrytangyuan commented Feb 8, 2021

@jessesuen @alexec @sarabala1979 Do you know if there's a reason for writing the creation and status checking for resource type template using kubectl? We have the following two issues when using this at scale:

  1. We'd like to stay away from long-running connections via kubectl get -w for various reasons, e.g. better control of status checking calls. We observe constant connection drops for some of our clusters and it had to re-establish the watch so the -w here doesn't help much in this case.
  2. We'd like to track the traffic from the resource checking in executor using identifier like user agent like in feat(executor): Add user agent to workflow executor #5014. However, the user agent using kubectl is not configurable and hard-coded to be something like kubectl/v1.18.8 (linux/amd64) kubernetes/9f2892a, which is mixed with other uses of kubectl from other applications in the cluster.

Here's what I am proposing to help address the above issues:

Rewrite the status checking part using k8s Go client instead of kubectl, which allows us to configure the user agent, resource version when querying status, the status checking interval, etc. We have something working internally and can be migrated to the open source version if the approach is agreed.

Any feedback and suggestions would be appreciated.

@alexec
Copy link
Contributor

alexec commented Mar 10, 2021

This is a yes from me. I think we should build a new template called resource2 and have the agent execute it.

@jessesuen
Copy link
Member

We'd like to stay away from long-running connections via kubectl get -w for various reasons

So are you suggesting we poll instead of watch? I think that would be reasonable and less taxing on API server. I guess the next question is how will users get the ability to control this interval

The choice of using kubectl was a convenience. The resource template is 3+ years old and at the time:

  1. dynamic client did not exist
  2. kubectl logic was not consumable as a library

Now that they do, I agree could do this without kubectl.

We'd like to track the traffic from the resource checking in executor using identifier like user agent

Does user agent not already include the service account? That's unfortunate if it doesn't.

I think we should build a new template called resource2 and have the agent execute it.

While this will create less pods, I don't think this will save on API server load. In any case, if we do decide to do it in an agent, I would still prefer to avoid a new template type and reuse existing resource template

@alexec
Copy link
Contributor

alexec commented Mar 10, 2021

would still prefer to avoid a new template type and reuse existing resource template

How about updating the current resource implementation to use a dynamic informer then? I anticipate that we would be able to reuse that code in the agent.

@terrytangyuan
Copy link
Member Author

terrytangyuan commented Mar 10, 2021

Thank you! I think we can reuse the existing resource template. I am not sure if we want to switch to informer yet (though that was originally proposed in this issue) as that would require additional resources on executor pod whereas currently the requirements can be customized to be minimal.

Just to clarify - the main changes of my proposal would be:

  1. Switch to use k8s SDK instead of kubectl so it's more configurable and friendly to monitoring tools.
  2. Switch to poll instead of watch so we have more control over cases where API server is unstable. We could allow configurable via an environment variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Feature request
Projects
None yet
Development

No branches or pull requests

4 participants