add kep-162 Colocated Placement #168

vie-serendipity · 2024-07-01T12:22:00Z

What type of PR is this?

/kind documentation

What this PR does / why we need it

Support colocated placement of multiple pod groups

Which issue(s) this PR fixes

Fixes #162

Special notes for your reviewer

I don't know how to upload image to aws and I just push image to repo. This should be fixed later.

Does this PR introduce a user-facing change?

k8s-ci-robot · 2024-07-01T12:22:06Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vie-serendipity
Once this PR has been reviewed and has the lgtm label, please assign kerthcet for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-07-01T12:22:08Z

Welcome @vie-serendipity!

It looks like this is your first PR to kubernetes-sigs/lws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/lws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2024-07-01T12:22:09Z

Hi @vie-serendipity. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

liurupeng · 2024-07-02T07:41:35Z

/ok-to-test

vie-serendipity · 2024-07-05T06:35:53Z

any suggestions?

kerthcet · 2024-07-08T06:57:29Z

Will visit this API later for bandwidth, sorry for that.

liurupeng · 2024-07-08T18:39:29Z

@vie-serendipity sry for the late reply, I was on vacation last week, checking it now

liurupeng · 2024-07-08T18:45:19Z

keps/162-colocated-placement/README.md

+know that this has succeeded?
+-->
+
+- Add a new way of topology scheduling and allow multiple pod groups land on the same domain


for one pod group (one multi-host inference replica), we need to colocate the pods, but for different pod groups, why we need to put them in the same domain since different replicas won't communicate with each other?

I might confuse you with my expression, I mean allowing all leaders to schedule freely across any topology, not restricting one pod group to a topology just like exclusive placement.

liurupeng · 2024-07-08T20:01:43Z

keps/162-colocated-placement/README.md

+
+### SubGroup Policy Support
+Colocated placement can support subgroup policy as well.
+Compared with [exclusive support for subgroup policy](https://github.com/kubernetes-sigs/lws/assets/86417275/ff9fc93d-c738-4c09-abc8-50a7b16d49df), the workflow is almost identical, as follows:


the subgroup feature will only have one leader pod for all the pods in the same topology

For the subgroup policy, it seems like it just needs to ensure that the leader and workers are in the same topology, so I think colocated placement should meet its requirements.

liurupeng · 2024-07-08T20:05:23Z

overall lgtm, but one more part to consider is the existing lws workload that use exclusive placement. Do we want to keep two set of APIs to use different placement strategies?

liurupeng · 2024-07-08T20:09:22Z

@ahg-g and @kerthcet any objection to this feature?

liurupeng · 2024-07-11T19:49:26Z

@ahg-g and @kerthcet any objection to this feature?

Abdullah is on vacation, @kerthcet could you review this as well. In general, I think allowing configuring more customized scheduling semantics will be very helpful

ahg-g · 2024-07-17T12:41:07Z

I will take a look after I come back from vacation, first week of August.

k8s-ci-robot · 2024-07-30T00:30:26Z

@vie-serendipity: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-lws-verify-main	`f9a1557`	link	true	`/test pull-lws-verify-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

liurupeng · 2024-08-05T18:01:50Z

@ahg-g could you help check this one when got some time? thanks!

ahg-g · 2024-08-05T19:50:22Z

keps/162-colocated-placement/README.md

+
+const (
+  ExclusiveTopologyPlacementPolicyType TopologyPlacementPolicyType = "Exclusive" 
+  ColocatedTopologyPlacementPolicyType TopologyPlacementPolicyType = "Colocated"


How will we implement this?

If we implement this using pod-affinity (i.e., same as exclusive placement, but removing the anti-affinity term), we will run into race conditions: consider the case where two groups try to schedule on the same domain at the same time but there is capacity only for one, there is a possibility that both will partially schedule on the domain, and none will schedule fully.

That‘s a tricky case I didn't consider before. It seems like there‘s already an issue when scheduling multiple pod groups with limited resources. Refer to #167
Co-scheduling plugin conflicts with LeaderReady startup policy.
In this situation, I think it has to rely on the autoscaler or adding more resources manully.

I understand that a deadlock with the plugin only happens when the leaderready startup policy is adopted. We can do gang scheduling with the pod group declared in the plugin. This can fix the problem you mentioned.
It's worth noting that colocated placement policy and leaderready startup policy can't be set together, and we have to validate in the webhook.
But I'm not sure whether relying on an out-of-tree scheduler plugin is appropriate.

I guess this is more related to scheduling rather than workload itself? This is an exist problem for all distributed ones. Unless we have a queueing system ourself just like WaitForPodsReady in kueue.

kerthcet · 2024-08-06T12:03:06Z

Sorry @liurupeng I leave the community for couple of days for some reasons. I'll back to this ASAP.

kerthcet

I think we should have a build-in feature gate system as kubernetes or it will impact our api version a lot, like this one, I have no idea whether we're on the right path because we don't have much feedbacks, without feature gates, we'll have to bump the apiversion to avoid backwards compatibilities. WDYT? @ahg-g @liurupeng

kerthcet · 2024-08-12T09:05:13Z

keps/162-colocated-placement/README.md

+  TopologyPlacementPolicy TopologyPlacementPolicy `json:"topologyPlacementPolicy",omitempty`
+}
+
+type TopologyPlacementPolicy struct {


Maybe GroupPlacementPolicy?

Sounds good.

kerthcet · 2024-08-12T09:05:58Z

keps/162-colocated-placement/README.md

+
+type LeaderWorkerTemplate struct {
+  // +optional
+  TopologyPlacementPolicy TopologyPlacementPolicy `json:"topologyPlacementPolicy",omitempty`


Should this a slice in case we'll append more than one policies to the lws, and the policies should be ANDed?

kerthcet · 2024-08-12T09:07:29Z

keps/162-colocated-placement/README.md

+
+const (
+  ExclusiveTopologyPlacementPolicyType TopologyPlacementPolicyType = "Exclusive" 
+  ColocatedTopologyPlacementPolicyType TopologyPlacementPolicyType = "Colocated"


I guess this is more related to scheduling rather than workload itself? This is an exist problem for all distributed ones. Unless we have a queueing system ourself just like WaitForPodsReady in kueue.

kerthcet · 2024-08-12T09:09:08Z

keps/162-colocated-placement/README.md

+  // +kubebuilder:default=None
+  // +kubebuilder:validation=Enum={ExclusiveTopologyPlacementPolicyType,ColocatedTopologyPlacementPolicyType,NoneTopologyPlacementPolicyType}
+  type TopologyPlacementPolicyType `json:"type"`
+  topologyKey *string `json:"topologyKey",omitempty`


Why a pointer?

If the policy is none, topologyKey should be nil, otherwise, it should be explicitly declared. So I think pointer is more suitable.

kerthcet · 2024-08-12T09:14:44Z

keps/162-colocated-placement/README.md

+### User Stories (Optional)
+
+#### Story 1
+Each pod group (leader and workers) is colocated in one domain(i.e., more than one group could land on the same domain)


Can you think of a real case, this one is somehow a more theoretical one.

Based on my understanding, assuming a nodepool is a topology domain, each nodepool contains many nodes with the same GPU(easy to scale). So, it makes sense to schedule more than one group to a nodepool naturally.
And it's reasonable to schedule a group of pods to a single nodepool since they have the same GPU, ensuring equal inference speeds.

kerthcet · 2024-08-12T09:21:10Z

keps/162-colocated-placement/README.md

+type TopologyPlacementPolicy struct {
+  // +kubebuilder:default=None
+  // +kubebuilder:validation=Enum={ExclusiveTopologyPlacementPolicyType,ColocatedTopologyPlacementPolicyType,NoneTopologyPlacementPolicyType}
+  type TopologyPlacementPolicyType `json:"type"`


Should we add the constraint mode like prefer or required? But this is not that vital, we can add later.

liurupeng · 2024-09-14T16:35:38Z

@vie-serendipity we are planning the release v0.5.0, do you want to proceed with this KEP? thanks!

ahg-g · 2024-09-14T22:58:04Z

@vie-serendipity we are planning the release v0.5.0, do you want to proceed with this KEP? thanks!

The proposal must have a way to address #168 (comment)

vie-serendipity · 2024-09-15T10:30:22Z

@liurupeng I'm glad to push this work forward.

vie-serendipity · 2024-09-15T10:33:28Z

@ahg-g I agree with kerthcet. I think it's more about scheduling and it's not the scope lws should cover. How do you think?

ahg-g · 2024-10-05T19:39:12Z

@ahg-g I agree with kerthcet. I think it's more about scheduling and it's not the scope lws should cover. How do you think?

But we can't offer an API that is not backed by an implementation.

k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 1, 2024

k8s-ci-robot requested review from kerthcet and liurupeng July 1, 2024 12:22

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 1, 2024

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 1, 2024

vie-serendipity changed the title ~~add kep-165 Colocated Placement~~ add kep-162 Colocated Placement Jul 2, 2024

add kep-162 Colocated Placement

f9a1557

vie-serendipity force-pushed the feat/kep-colocation branch from 444b604 to f9a1557 Compare July 2, 2024 01:33

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 2, 2024

liurupeng reviewed Jul 8, 2024

View reviewed changes

ahg-g reviewed Aug 5, 2024

View reviewed changes

kerthcet reviewed Aug 12, 2024

View reviewed changes

liurupeng mentioned this pull request Sep 14, 2024

Release v0.5.0 requirements #221

Open

add kep-162 Colocated Placement #168

Are you sure you want to change the base?

add kep-162 Colocated Placement #168

Conversation

vie-serendipity commented Jul 1, 2024

What type of PR is this?

What this PR does / why we need it

Which issue(s) this PR fixes

Special notes for your reviewer

Does this PR introduce a user-facing change?

k8s-ci-robot commented Jul 1, 2024

k8s-ci-robot commented Jul 1, 2024

k8s-ci-robot commented Jul 1, 2024

liurupeng commented Jul 2, 2024

vie-serendipity commented Jul 5, 2024

kerthcet commented Jul 8, 2024

liurupeng commented Jul 8, 2024

liurupeng Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurupeng commented Jul 8, 2024

liurupeng commented Jul 8, 2024

liurupeng commented Jul 11, 2024

ahg-g commented Jul 17, 2024

k8s-ci-robot commented Jul 30, 2024

liurupeng commented Aug 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kerthcet commented Aug 6, 2024

kerthcet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurupeng commented Sep 14, 2024

ahg-g commented Sep 14, 2024

vie-serendipity commented Sep 15, 2024

vie-serendipity commented Sep 15, 2024

ahg-g commented Oct 5, 2024

liurupeng Jul 8, 2024 •

edited

Loading