Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: Multi level Cluster Queues #1093

Closed

Conversation

KunWuLuan
Copy link
Member

This proposal allow cluster admins to define a multi-level hierachy for cluster queues. Multi-level of cluster queues allow admins and users to define different dequeue policies for different node. This will allow Kueue to manage more levels of resources.

What type of PR is this?

/kind feature

What this PR does / why we need it:

Systems like Yarn allow creating a hierarchy of fair sharing, which allows modeling deeper organizational structures with fair-sharing.
Kueue currently supports three organizational levels: Cohort (models a business unit), ClusterQueue (models divisions within a business unit), namespace (models teams within a division). However fair-sharing is only supported at one level, within a cohort. This is not convinent if there are more than one level in an organization or if some users want to manage the resource consumption of his/her own jobs.

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 4, 2023
@k8s-ci-robot k8s-ci-robot requested a review from ahg-g September 4, 2023 02:24
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: KunWuLuan
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 4, 2023
@netlify
Copy link

netlify bot commented Sep 4, 2023

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 9a4ceb3
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/65040bd60aee490008489e33
😎 Deploy Preview https://deploy-preview-1093--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 4, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @KunWuLuan. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 4, 2023
@KunWuLuan
Copy link
Member Author

Not sure plan A or B is better. Maintain a tree is easier when there are a lot of cluster queues in cluster, but it will cost more time when try to update the tree because we always have to update the whole tree.

- Allow users to define weights for different job of him/her.
### Non-Goals

- Cohort will not be deprecated because Kueue is forward campatible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not deprecating (or significantly limiting) cohorts would allow a kind-of 2 dimensional hierarchy with hard to understand logic (and even more complex code).

This proposal allow cluster admins to define a multi-level hierachy for cluster queues. Multi-level of cluster queues allow admins and users to define different dequeue policies for different node. This will allow Kueue to manage more levels of resources.
## Motivation
Systems like Yarn allow creating a hierarchy of fair sharing, which allows modeling deeper organizational structures with fair-sharing.
Kueue currently supports three organizational levels: Cohort (models a business unit), ClusterQueue (models divisions within a business unit), namespace (models teams within a division). However fair-sharing is only supported at one level, within a cohort. This is not convinent if there are more than one level in an organization or if some users want to manage the resource consumption of his/her own jobs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would exactly fair sharing work on multi-level hierarchy? How is it different from having a flattened structure with multiplied/compounded weights (from root to leaf)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in every node in cluster queue tree, cluster queues will have their guaranteed resources( min weight or nominal resouces ), parent cluster queues' min resources must larger than the sum of their children's
if any node's usage is more than min, then workloads in it can be preempted by other node in tree

@KunWuLuan KunWuLuan force-pushed the kep/multi-level-cluster-queue branch from f6e1a95 to 9a4ceb3 Compare September 15, 2023 07:46
@KunWuLuan KunWuLuan requested a review from mwielgus October 23, 2023 12:11
- Cohort will not be deprecated because Kueue is forward campatible.
## Proposal
We will extend cluster queues to allow a cluster queue to be the parent of another cluster queue. And allow admins to define weights for all these cluster queues.
### PlanA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you plan to handle cluster-to-cluster authentication?

@alculquicondor
Copy link
Contributor

/close
in favor of #1531

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: Closed this PR.

In response to this:

/close
in favor of #1531

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants