-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MultiKueue] Support Deployment Integration #3802
Comments
In the ideal solution, I think that we need to implement the managedBy field in all workload objects like Deployment and StatefulSet. @mimowo Do you have any concerns about implementing the managedBy feature same as the batch/v1 Job? |
We don't know yet the best path forward, from the initial discussion with @mwielgus we would like to support MultiKueue for Pods. Then, Deployement and StatefulSet integrations would work for free. The users using pod integration could also benefit. The problem is how to achieve "managedBy" for pods. The initial ideas we discussed:
1, might will work ok, but might not be transparent to the end-users. 2 is relatively simple technically, but the issue is that we update the status, while the pod is gated, which might be violating the Pod API. 3. is harder, but would allow to update the Pod status without tricks. |
@mimowo Can you elaborate a bit more on option 3? Not entirely sure what you mean by a virtual MultiKueue node. Thanks! |
@Bobbins228 sure. In this idea we would have a dedicated single "virtual" node (without real kubelet, just Node API object, could be called "multikueue-virtual-node"). On the management cluster Kueue would bind the pods managed by MultiKueue to that node (resulting in the "spec.NodeName=multikueue-virtual-node". Then, MultiKueue would take the responsibility of Kubelet and update the Pod status based on the worker cluster. |
@mimowo With future AppWrapper MultiKueue support would we not run into the same issues with updating individual statuses of the Kubernetes Objects created through AppWrapper? |
cc @dgrove-oss to keep me honest here, but I believe the AppWrapper integration for MultiKueue should work the same way as other CRDs. For other CRDs, like Job or JobSet, we use the |
This matches my understanding of how it should work. We've implemented a |
@dgrove-oss @mimowo - Thanks for the inputs! This would be helpful! Would upstream still be interested in exploring Option 3 of providing direct Pod integration in Kueue? @Bobbins228 and I have been working on getting a PoC ready with a virtual node and wanted to check if it would be beneficial to have direct integration support or the suggestion would be to use AppWrapper and create pods/deployments/SS underneath it? One plus point of having direct pod integration would be to be able to manage serving workloads directly on MK though we are still to figure out how to schedule pods belonging to a single Deployment/SS on multiple clusters (even with Option 3) to get the true benefit. |
@varshaprasad96 we are definitely looking forward for native support of Pods in MultiKueue. The main reason being the ability to integrate MultiKueue with 3-party frameworks or in-house software which would otherwise need to be modified to use AppWrapper (which is a point of friction for many surely). Also:
As for the implementation, I think it is also reasonable to start with (2.) using a dedicated scheduling gate, and only later implement (3.) when proven to be needed - as this seems more involving. |
@mimowo Thanks for the input! We were exploring the use of virtual-kubelet to create virtual nodes and schedule pods + sync status for them. But as you mentioned, Option 2, is way more easier, and if #3526 is implemented that would fit the use case pretty well. |
As discussed on the PR #4034 (review) the next step would be to use a dedicated scheduling gate so that we can transition the AdmissionCheck to Ready when we make the pod running. I believe (3.) from #3802 (comment) could be addressed as a separate issue, maybe as part of #3526 (but not sure). |
What would you like to be added:
The ability to create Kubernetes deployments on remote Clusters through MultiKueue
Why is this needed:
Support for various integrations already exist i.e. Jobset, KubeFlow Jobs, MPI, Batch.
Our use case is for long running model serving deployments that can be created remotely from a Manager Cluster.
Completion requirements:
Deployments can be created/managed locally on the Manager Cluster and through MultiKueue created/managed on the Worker Cluster(s) without the risk of running on the Manager.
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: