Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MultiKueue] Support Deployment Integration #3802

Open
3 tasks
Bobbins228 opened this issue Dec 10, 2024 · 12 comments
Open
3 tasks

[MultiKueue] Support Deployment Integration #3802

Bobbins228 opened this issue Dec 10, 2024 · 12 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@Bobbins228
Copy link
Contributor

What would you like to be added:

The ability to create Kubernetes deployments on remote Clusters through MultiKueue
Why is this needed:

Support for various integrations already exist i.e. Jobset, KubeFlow Jobs, MPI, Batch.
Our use case is for long running model serving deployments that can be created remotely from a Manager Cluster.

Completion requirements:

Deployments can be created/managed locally on the Manager Cluster and through MultiKueue created/managed on the Worker Cluster(s) without the risk of running on the Manager.

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@Bobbins228 Bobbins228 added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 10, 2024
@mimowo
Copy link
Contributor

mimowo commented Dec 11, 2024

cc @mwielgus @mwysokin

@tenzen-y
Copy link
Member

In the ideal solution, I think that we need to implement the managedBy field in all workload objects like Deployment and StatefulSet.

@mimowo Do you have any concerns about implementing the managedBy feature same as the batch/v1 Job?

@mimowo
Copy link
Contributor

mimowo commented Dec 11, 2024

We don't know yet the best path forward, from the initial discussion with @mwielgus we would like to support MultiKueue for Pods. Then, Deployement and StatefulSet integrations would work for free. The users using pod integration could also benefit. The problem is how to achieve "managedBy" for pods. The initial ideas we discussed:

  1. gate the pod on the management cluster and block status updates
  2. gate the pod on the management cluster and support status updates
  3. schedule the pod on the management cluster, but on a virtual MultiKueue node

1, might will work ok, but might not be transparent to the end-users. 2 is relatively simple technically, but the issue is that we update the status, while the pod is gated, which might be violating the Pod API. 3. is harder, but would allow to update the Pod status without tricks.

@Bobbins228
Copy link
Contributor Author

@mimowo Can you elaborate a bit more on option 3? Not entirely sure what you mean by a virtual MultiKueue node. Thanks!

@mimowo
Copy link
Contributor

mimowo commented Jan 7, 2025

@Bobbins228 sure.

In this idea we would have a dedicated single "virtual" node (without real kubelet, just Node API object, could be called "multikueue-virtual-node"). On the management cluster Kueue would bind the pods managed by MultiKueue to that node (resulting in the "spec.NodeName=multikueue-virtual-node". Then, MultiKueue would take the responsibility of Kubelet and update the Pod status based on the worker cluster.

@Bobbins228
Copy link
Contributor Author

@mimowo With future AppWrapper MultiKueue support would we not run into the same issues with updating individual statuses of the Kubernetes Objects created through AppWrapper?
For example if I created a Pod using an AppWrapper from the Manager cluster I would not be able to get that Pod's up to date status from the remote?

@mimowo
Copy link
Contributor

mimowo commented Jan 13, 2025

cc @dgrove-oss to keep me honest here, but I believe the AppWrapper integration for MultiKueue should work the same way as other CRDs.

For other CRDs, like Job or JobSet, we use the spec.managedBy field to skip reconciling the API object on the management cluster, and as a consequence, avoid creation of pods on the management cluster. So, in that case the pods will only be created on the worker clusters and they will have the correct status. The status of the API object is copied from the worker cluster to the management cluster by MultiKueue - so the AppWrapperStatus will be "mirrored" only.

@dgrove-oss
Copy link
Contributor

cc @dgrove-oss to keep me honest here, but I believe the AppWrapper integration for MultiKueue should work the same way as other CRDs.

For other CRDs, like Job or JobSet, we use the spec.managedBy field to skip reconciling the API object on the management cluster, and as a consequence, avoid creation of pods on the management cluster. So, in that case the pods will only be created on the worker clusters and they will have the correct status. The status of the API object is copied from the worker cluster to the management cluster by MultiKueue - so the AppWrapperStatus will be "mirrored" only.

This matches my understanding of how it should work. We've implemented a spec.managedBy in AppWrapper to support Multi-Kueue. So the integration should work just like any other CRD that has a managedBy.

@varshaprasad96
Copy link
Member

varshaprasad96 commented Jan 13, 2025

@dgrove-oss @mimowo - Thanks for the inputs! This would be helpful!

Would upstream still be interested in exploring Option 3 of providing direct Pod integration in Kueue? @Bobbins228 and I have been working on getting a PoC ready with a virtual node and wanted to check if it would be beneficial to have direct integration support or the suggestion would be to use AppWrapper and create pods/deployments/SS underneath it?

One plus point of having direct pod integration would be to be able to manage serving workloads directly on MK though we are still to figure out how to schedule pods belonging to a single Deployment/SS on multiple clusters (even with Option 3) to get the true benefit.

@mimowo
Copy link
Contributor

mimowo commented Jan 14, 2025

@varshaprasad96 we are definitely looking forward for native support of Pods in MultiKueue.

The main reason being the ability to integrate MultiKueue with 3-party frameworks or in-house software which would otherwise need to be modified to use AppWrapper (which is a point of friction for many surely).

Also:

  • IIUC the partial preemption of Deployments would not work when wrapped in AppWrapper, whilst it can be easily work via Pods
  • some users may want to actually be able to create the "suspended" pods locally, and fetch the logs of the Pods from the MultiKueue worker cluster, allowing to retrieve logs from worker clusters via kubectl logs, somewhat similar to: Support log retrieval on MultiKueue #3526

As for the implementation, I think it is also reasonable to start with (2.) using a dedicated scheduling gate, and only later implement (3.) when proven to be needed - as this seems more involving.

@varshaprasad96
Copy link
Member

@mimowo Thanks for the input! We were exploring the use of virtual-kubelet to create virtual nodes and schedule pods + sync status for them. But as you mentioned, Option 2, is way more easier, and if #3526 is implemented that would fit the use case pretty well.

@mimowo
Copy link
Contributor

mimowo commented Jan 23, 2025

As discussed on the PR #4034 (review) the next step would be to use a dedicated scheduling gate so that we can transition the AdmissionCheck to Ready when we make the pod running.

I believe (3.) from #3802 (comment) could be addressed as a separate issue, maybe as part of #3526 (but not sure).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

5 participants